In yesterday’s blog post we learned what is HDFS. In this article we will take a quick look at the importance of the Relational Database in Big Data world.
A Big Question?
Here are a few questions I often received since the beginning of the Big Data Series -
- Does the relational database have no space in the story of the Big Data?
- Does relational database is no longer relevant as Big Data is evolving?
- Is relational database not capable to handle Big Data?
- Is it true that one no longer has to learn about relational data if Big Data is the final destination?
Well, every single time when I hear that one person wants to learn about Big Data and is no longer interested in learning about relational database, I find it as a bit far stretched.
I am not here to give ambiguous answers of It Depends. I am personally very clear that one who is aspiring to become Big Data Scientist or Big Data Expert they should learn about relational database.
The reason for the NoSQL Movement in recent time was because of the two important advantages of the NoSQL databases.
- Flexible Schema
In personal experience I have found that when I use NoSQL I have found both of the above listed advantages when I use NoSQL database. There are instances when I found relational database too much restrictive when my data is unstructured as well as they have in the datatype which my Relational Database does not support. It is the same case when I have found that NoSQL solution performing much better than relational databases. I must say that I am a big fan of NoSQL solutions in the recent times but I have also seen occasions and situations where relational database is still perfect fit even though the database is growing increasingly as well have all the symptoms of the big data.
Situations in Relational Database Outperforms
Adhoc reporting is the one of the most common scenarios where NoSQL is does not have optimal solution. For example reporting queries often needs to aggregate based on the columns which are not indexed as well are built while the report is running, in this kind of scenario NoSQL databases (document database stores, distributed key value stores) database often does not perform well. In the case of the ad-hoc reporting I have often found it is much easier to work with relational databases.
SQL is the most popular computer language of all the time. I have been using it for almost over 10 years and I feel that I will be using it for a long time in future. There are plenty of the tools, connectors and awareness of the SQL language in the industry. Pretty much every programming language has a written drivers for the SQL language and most of the developers have learned this language during their school/college time. In many cases, writing query based on SQL is much easier than writing queries in NoSQL supported languages. I believe this is the current situation but in the future this situation can reverse when No SQL query languages are equally popular.
ACID (Atomicity Consistency Isolation Durability) – Not all the NoSQL solutions offers ACID compliant language. There are always situations (for example banking transactions, eCommerce shopping carts etc.) where if there is no ACID the operations can be invalid as well database integrity can be at risk. Even though the data volume indeed qualify as a Big Data there are always operations in the application which absolutely needs ACID compliance matured language.
The Mixed Bag
I have often heard argument that all the big social media sites now a days have moved away from Relational Database. Actually this is not entirely true. While researching about Big Data and Relational Database, I have found that many of the popular social media sites uses Big Data solutions along with Relational Database. Many are using relational databases to deliver the results to end user on the run time and many still uses a relational database as their major backbone.
Here are a few examples:
- Facebook uses MySQL to display the timeline. (Reference Link)
- Twitter uses MySQL. (Reference Link)
- Tumblr uses Sharded MySQL (Reference Link)
- Wikipedia uses MySQL for data storage. (Reference Link)
There are many for prominent organizations which are running large scale applications uses relational database along with various Big Data frameworks to satisfy their various business needs.
I believe that RDBMS is like a vanilla ice cream. Everybody loves it and everybody has it. NoSQL and other solutions are like chocolate ice cream or custom ice cream – there is a huge base which loves them and wants them but not every ice cream maker can make it just right for everyone’s taste. No matter how fancy an ice cream store is there is always plain vanilla ice cream available there. Just like the same, there are always cases and situations in the Big Data’s story where traditional relational database is the part of the whole story. In the real world scenarios there will be always the case when there will be need of the relational database concepts and its ideology. It is extremely important to accept relational database as one of the key components of the Big Data instead of treating it as a substandard technology.
Ray of Hope – NewSQL
In this module we discussed that there are places where we need ACID compliance from our Big Data application and NoSQL will not support that out of box. There is a new termed coined for the application/tool which supports most of the properties of the traditional RDBMS and supports Big Data infrastructure – NewSQL.
In tomorrow’s blog post we will discuss about NewSQL.
Reference: Pinal Dave (http://blog.sqlauthority.com)