Since I have started my consulting engagements with customers around the world, I have been asked interesting queries from time to time. In a recent customer call, the customer insisted I assist them in taking a call on if they need to buy one server with 64 cores or if they need to buy 2 servers of 32 cores. I had to explain that it is always not that simple to answer the same. After close to 30 mins of conversation they understood finally what I was talking. I took a moment to blog about this engaging conversation about database scalability.
Historically, database systems adopted two main approaches to scalability, which enables a database to accommodate more user data and process more application workload. They would scale up (called vertical scalability) and scale out (called horizontal scalability). For both relational and non-relational technologies, sharding is one of the most common scale-out physical implementations. I went about explaining both these concepts next.
Vertical Database Scalability
To scale up usually refers to adding more physical resources—that is, increasing CPU, memory, and storage for an existing server or adding a bigger one. In essence with vertical scalability:
- Application compatibility is prioritized—there’s no need for code changes.
- Administrative efforts are reduced with only a single system image to manage.
- Hardware configurations tend to be more expensive, although today’s quickly evolving hardware provides incredibly efficient server components with great price-performance ratios.
- Software costs (typically charged by the number of cores) can increase.
This approach also comes with at least a couple of limitations:
- What happens when a workload cannot fit onto the best-equipped hardware configuration?
- What if a workload is highly variable? Why make an upfront investment in an expensive, large-capacity system that could go underutilized much of the time? For this reason alone, many cloud providers do not rely solely on vertical scalability.
Horizontal Database Scalability
Horizontal scalability accommodates variable workloads by hosting data across multiple databases. Unlike vertical scalability, scale-out approaches can help reduce costs by making use of less sophisticated hardware components, freeing resources for more in-application development and data and system maintenance.
You can use any of several well-known approaches to scaling out data tiers. (For example, see the Wikipedia topic.) The one you choose depends on your workload and the applications supported by the data store. Most people choose functional partitioning in which a data set is decomposed into business or organizational functionalities.
Two of the most common scale-out techniques are as follows:
- Data is fully replicated across all nodes. One primary copy accepts changes, and multiple active replicas are typically read-only, as in the SQL Server AlwaysOn Readable Secondaries or Replication features. Such configurations can be a good fit for read-intensive workloads such as reporting, where readers can potentially connect to any server and execute their queries. By contrast, writers can connect only to the primary copy, causing a bottleneck in write-intensive workloads.
- Read and write operations are distributed across a number of nodes. By applying a distribution logic of some kind, a given transaction is fully satisfied by entities residing on a single node. Typically, some data is replicated across all nodes, but activity or transactional data is partitioned based on a key, simple or composite, which applies to all entities that belong to the same logical group. Sharding refers to this approach.
Some of these concepts being abstract have been implemented uniquely in the cloud world like on Azure and is called as Elastic Databases. Earlier they were implemented called as Federated Databases. Although such techniques exist and implemented in the Cloud world, these need to be designed and implemented by architects / Developers and DBA for their on-premise.
I am sure you would have also worked on some of these design concepts in your environments and database designs. Do let me know what have you done for scale out / scale-up scenarios. What did you do and why did you do? Sharing your experience can be great for all. Let me know via the comments.
Reference: Pinal Dave (https://blog.sqlauthority.com)
There is also the case where the the data is not replicated, but concurrently available to the nodes, like Oracle RAC
Do you anticipate having your database back-end on the cloud would eliminate the need for managing scalability within the organization? In other words would it be wise to entrust that part of your DB to your cloud provider?
can we scale SQL db (Relational Db) horizontally?
In summary, don’t pick Sql Server if you might one day want to support Big Data