In yesterday’s blog post we learned the importance of the Pig and Pig Latin in Big Data Story. In this article we will understand what is Sqoop and Zookeeper in Big Data Story.
There are two most important components one should learn when learning about interacting with Hadoop – Sqoop and Zookper.
What is Sqoop?
Most of the business stores their data in RDBMS as well as other data warehouse solutions. They need a way to move data to the Hadoop system to do various processing and return it back to RDBMS from Hadoop system. The data movement can happen in real time or at various intervals in bulk. We need a tool which can help us move this data from SQL to Hadoop and from Hadoop to SQL. Sqoop (SQL to Hadoop) is such a tool which extract data from non-Hadoop data sources and transform them into the format which Hadoop can use it and later it loads them into HDFS. Essentially it is ETL tool where it Extracts, Transform and Load from SQL to Hadoop. The best part is that it also does extract data from Hadoop and loads them to Non-SQL (or RDBMS) data stores. Essentially, Sqoop is a command line tool which does SQL to Hadoop and Hadoop to SQL. It is a command line interpreter. It creates MapReduce job behinds the scene to import data from an external database to HDFS. It is very effective and easy to learn tool for nonprogrammers.
What is Zookeeper?
- Zookeeper manages the entire workflow of starting and stopping various nodes in the Hadoop’s cluster.
- In Hadoop cluster when any processes need certain configuration to complete the task. Zookeeper makes sure that certain node gets necessary configuration consistently.
- In case of the master node fails, Zookeepr can assign new master node and make sure cluster works as expected.
There many other tasks Zookeeper performance when it is about Hadoop cluster and communication. Basically without the help of Zookeeper it is not possible to design any new fault tolerant distributed application.
Tomorrow
In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Big Data Analytics.
Reference: Pinal Dave (https://blog.sqlauthority.com)
2 Comments. Leave new
Can you share the sources from where to start learning for Hadoop administration.