In yesterday’s blog post we understood how Big Data evolution happened. Today we will understand basics of the Big Data Architecture.
Just like every other database related applications, bit data project have its development cycle. Though three Vs (link) for sure plays an important role in deciding the architecture of the Big Data projects. Just like every other project Big Data project also goes to similar phases of the data capturing, transforming, integrating, analyzing and building actionable reporting on the top of the data.
While the process looks almost same but due to the nature of the data the architecture is often totally different. Here are few of the question which everyone should ask before going ahead with Big Data architecture.
How big is your total database?
What is your requirement of the reporting in terms of time – real time, semi real time or at frequent interval?
How important is the data availability and what is the plan for disaster recovery?
What are the plans for network and physical security of the data?
What platform will be the driving force behind data and what are different service level agreements for the infrastructure?
This are just basic questions but based on your application and business need you should come up with the custom list of the question to ask. As I mentioned earlier this question may look quite simple but the answer will not be simple. When we are talking about Big Data implementation there are many other important aspects which we have to consider when we decide to go for the architecture.
It is absolutely impossible to discuss and nail down the most optimal architecture for any Big Data Solution in a single blog post, however, we can discuss the basic building blocks of big data architecture. Here is the image which I have built to explain how the building blocks of the Big Data architecture works.
Above image gives good overview of how in Big Data Architecture various components are associated with each other. In Big Data various different data sources are part of the architecture hence extract, transform and integration are one of the most essential layers of the architecture. Most of the data is stored in relational as well as non relational data marts and data warehousing solutions. As per the business need various data are processed as well converted to proper reports and visualizations for end users. Just like software the hardware is almost the most important part of the Big Data Architecture. In the big data architecture hardware infrastructure is extremely important and failure over instances as well as redundant physical infrastructure is usually implemented.
NoSQL is a very famous buzz word and it really means Not Relational SQL or Not Only SQL. This is because in Big Data Architecture the data is in any format. It can be unstructured, relational or in any other format or from any other data source. To bring all the data together relational technology is not enough, hence new tools, architecture and other algorithms are invented which takes care of all the kind of data. This is collectively called NoSQL.
Next four days we will answer the Buzz Words – Hadoop.
Reference: Pinal Dave (http://blog.sqlauthority.com)