Note: This blog post part 1 of the series and you can download 30-day trial of ScaleBase to practice the concepts.
Whether it is for extremely large data volumes, high transaction throughput or a massive number of concurrent users, today’s internet and mobile applications require the ability to respond to a new level of performance and usability. Databases need to be flexible enough to quickly accommodate growth and changes as they arise.
A lot has been written to compare MongoDB and MySQL. Both are solid solutions for DBAs to use. Given my focus on SQL, I’ve not yet written about MongoDB. However, I expect many of my readers are investigating NoSQL solutions, and the most popular NoSQL database is MongoDB.
At a high level, MongoDB is a NoSQL document database. It is considered highly scalable and cloud-ready. MySQL is a leading relational database that I know many of you are already using. I’d like to offer some thoughts to compare MongoDB, and MySQL across three main areas:
- Scalability, throughput and cloud-readiness.
- Data distribution, reads / writes, joins and data rebalancing
- Query model
Because, MySQL is not natively a distributed database, I want to level the playing field. ScaleBase is a well-regarded distributed relational database platform that uses MySQL (or MariaDB, or Percona Sever) for data storage. Adding ScaleBase into the equation makes MySQL a distributed database too, but it’s still a relational database with SQL properties and ACID compliance.
MongoDB and MySQL, with ScaleBase, are both positioned to take software engineering teams and their applications to the next level of data scalability.
I hope my comparison can help you understand which technology might be best for your new projects. Ultimately, data architects, dev-ops and developers need to choose the database technology that best meets their application’s requirements.
1. Scalability, throughput and cloud-readiness
MongoDB and MySQL with ScaleBase each leverage several proven techniques when it comes to scaling out applications.
A very powerful technique that can greatly extend the scalability of a database is called “sharding.” A lot has been written about sharding, and the difficulties inherent in homegrown, do-it-yourself partitioning approaches, so I won’t cover those topics in detail again here.
Let’s just say that sharding is a process that splits large data sets on a single server into smaller data sets, or horizontal partitions, placed onto multiple servers. The goal of this division of the data is to divide workloads between multiple servers, so as to 1) increase TPS throughput, 2) allow more concurrent users, and 3) to allow for much larger overall database size.
Figure 1 Scalability via auto-sharding
Sharding is an important part of how MongoDB achieves its scalability. In MongoDB, data is split between shards using a shard key. The shard key is a single field (or a set of composite fields) that identify the data to be sharded. Data is defined in “chunks”, by default 64MB of data. The chunk is the smallest logical block of data that will be sharded.
A particular shard is responsible for one, or more, chunks. The shard stores a “replica set” of the application data. This is a subset of all the data and is identified by the particular shard key. MongoDB controls which replica set is stored on a particular shard.
If you look directly into an individual shard to see what it contains, you will see a random subset of the data. Which data replica is stored in which shard, or partition, is somewhat random to the user, and unknown to the application. That’s why you (the client) always need to connect to the mongos and let MongoDB find the data you need.
Figure 2 MongoDB Architecture (taken from the MongoDB web site)
In the MongoDB architecture diagram above, the configuration of the cluster is stored in the Config Servers (which coincidently are mongod databases). Queries, and data, going back and forth between the client and the cluster are sent through the MongoDB routers, mongos. The mongos will route a query to the appropriate backend database or databases.
Data distribution occurs at the database level – it’s not defined by the application. The application actually “sees” a single database and MongoDB handles retrieving data from each shard partition (mongod). As a non-relational database, there are no database joins (I’ll have more on that later), so joins are achieved in the application. Most of the query processing is done in the data layer with aggregations done in the MongoDB middleware.
Now that we’ve reviewed sharding in MongoDB, you might be surprised to learn this is very similar to how MySQL with ScaleBase works. ScaleBase also automates sharding. Like MongoDB, ScaleBase’s data distribution is handled at the database level (and not in the application). Importantly, because ScaleBase maintains a relational data environment, ScaleBase will ensure that data that is accessed together is also stored together.
Like MongoDB, with ScaleBase the application “sees” a single database and not a bunch of shards/partitions. Like MongoDB, Scalebase also provides agility, flexibility, and growth. Database optimization is dynamic. Rebalancing (adding and removing nodes to accommodate growth and to handle database “hotspots”) is dynamic, and can be executed while the entire system is online. All this without changes to the application design or codebase.
However, there are two ways that ScaleBase sharding differs.
- Retaining ACID / SQL: With ScaleBase, joins are supported, even in a distributed environment. As MySQL is a relational database, ScaleBase embraces and retains data relations expressed in joins and transactions. With ScaleBase, queries can have local joins (within one shard) as well as cross-database joins. Unlike MongoDB where joins are accomplished in your application, ScaleBase will automatically aggregate data across shards for you. In this way, ScaleBase gives you ACID transactions and 2-phase commits across a sharded cluster of MySQL database nodes. This is unique.
- MySQL Ecosystem: The second way ScaleBase is a bit different is it preserves your MySQL skills and ecosystem technologies. It provides this scalability to your existing MySQL applications, if they are growing beyond the capabilities of your server (or Amazon RDS instance). Maybe more importantly, ScaleBase gives new MySQL apps a modern, distributed and relational data framework with near-linear scalability.
Figure 3 ScaleBase Architecture
Looking at a ScaleBase architecture diagram and relating back to MongoDB, the ScaleBase Config Servers here are the equivalent of the MongoDB mongod Config Servers. And ScaleBase is the equivalent of the MongoDB router, the mongos. The similarities are striking.
So far I’ve only focused my comparison on scalability, and we’ve seen that MongoDB and MySQL with ScaleBase are highly comparable. MongoDB is a NoSQL database well known for its scalability. The database world calls it a web scale technology and it’s catching on with developers worldwide. MySQL is the world’s the most widely used open source database, with a rich ecosystem of tooling and skilled users. ScaleBase is a modern relational database platform, build using standard MySQL, and optimized for the cloud, and takes advantage of your existing MySQL development skills and ecosystem technologies.
If you’re curious to explore ScaleBase, there’s a free 30-day trial you can download (http://www.scalebase.com/download-trial/).
Next time I will cover MongoDB and ScaleBase data models and examine how they handle data distribution and redistribution.
Reference: Pinal Dave (http://blog.sqlauthority.com)