In yesterday’s blog post we learned the importance of the HIVE in Big Data Story. In this article we will understand what is PIG and PIG Latin in Big Data Story.
Yahoo started working on Pig for their application deployment on Hadoop. The goal of Yahoo to manage their unstructured data.
Pig is a high level platform for creating MapReduce programs used with Hadoop and the language we use for this platform is called PIG Latin. The pig was designed to make Hadoop more user-friendly and approachable by power-users and nondevelopers. PIG is an interactive execution environment supporting Pig Latin language. The language Pig Latin has supported loading and processing of input data with series of transforming to produce desired results. PIG has two different execution environments 1) Local Mode – In this case all the scripts run on a single machine. 2) Hadoop – In this case all the scripts run on Hadoop Cluster.
Pig essentially creates set of map and reduce jobs under the hoods. Due to same users does not have to now write, compile and build solution for Big Data. The pig is very similar to SQL in many ways. The Ping Latin language provide an abstraction layer over the data. It focuses on the data and not the structure under the hood. Pig Latin is a very powerful language and it can do various operations like loading and storing data, streaming data, filtering data as well various data operations related to strings. The major difference between SQL and Pig Latin is that PIG is procedural and SQL is declarative. In simpler words, Pig Latin is very similar to SQ Lexecution plan and that makes it much easier for programmers to build various processes. Whereas SQL handles trees naturally, Pig Latin follows directed acyclic graph (DAG). DAGs is used to model several different kinds of structures in mathematics and computer science.
In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Zookeeper.
Reference: Pinal Dave (http://blog.sqlauthority.com)