MySQL – Scaling Existing Applications: Key Observations and Measurements

In the earlier blog post we discussed about How to Create a Distributed Relational SQL Database, today we will continue to learn about Scaling Existing Applications: Key Observations and Measurements.

In the use case of an existing application that is on the verge of reaching the limits of its monolithic MySQL database, scaling up – getting bigger hardware – can only go so far.  In today’s public, private and hybrid cloud world that leverages distributed infrastructure, scaling up is a counterintuitive approach. And it’s expensive.

A monolithic MySQL database that is suffering from scalability issues (for example; inconsistent performance, inconsistent availability, or transaction throughput bottlenecks) can become a distributed MySQL database that retains its relational principles by applying a declarative, policy-based data distribution process.

A well thought out data distribution policy aligns with the application’s current database structure and commands. Related data within various tables has to be identified and amassed to stay localized in a single database instance.

Simply put (and to paraphrase a familiar saying), “the data that plays together, should stay together.”  Our goal is to enable “reads” and “writes” to be completed successfully using only data from within one database instance (or shard) from the total distributed database. The more that data is processed within a single database instance, without needing other data found by traveling across machines within the distributed database array, the better that application performance and database scalability are enhanced. This way, compute processing is brought to the data, as opposed to the other way around.

Arriving to Data Distribution (Analysis)

Reads: Commands play a large role in queries, or reads. Examining what bits of data are accessed in joins, sub-queries or unions will indicate which pieces of data ought to be kept together on one machine. For example, when identifying the ‘users’ in a database, the next step would involve identifying the ‘orders’ related to those ‘users’, then the ‘items’ related to the ‘orders’, and so on and so forth. So, first, the tables, and data that are related, are identified. This usually comes from related tables, which have the same foreign keys. However, even if there are no foreign keys, it is possible to identify which columns within and between tables are generally joined for query execution.

Writes: On the transactions, or writes, side, additions to the database need to be placed in the appropriate partitioned database instance (or shard) with their related data. For instance, an ‘order’ is made up of many ‘items’, which are consequently added to the same shard as the ‘order’. A transaction is more efficient when it is contained to a single shard, so that the order and all of its items are placed on the same shard. This practice eliminates the need for a distributed transaction with a 2-phase-commit. In our example, due to the fact that there is not a direct link between ‘users’ and ‘items’, the distribution process is that much more significant because it requires cascading key lookup.

Efficiency dictates that we want data is either read together, such as in queries (i.e. sub-query joins), or written together, as in transactions.

Denormalization – Not the Best Solution

While denormalization may seem like a decent solution to data placement issues, ScaleBase’s cascading key lookup solution easily removes the need for denormalization whilst efficiently resolving any data placement issues. However, returning to our example, due to the fact that the ‘items’ table does not contain a ‘user_id’ category, the routing process can become very difficult. In this situation, if cascading key lookup is not utilized, denormalization is in order. After verifying which tables contain the distribution key, the key would have to be added to the tables in which it is missing. However, while this aids in solving the complex process of data distribution, it creates many additional problems along the way. Learn more about data distribution policies and the ScaleBase solution, here.

Null Columns

A shard key is the field according to which data is directed (it can be the ‘user_id’ or some other cascaded link, such as ‘order_id’, that eventually leads to ‘user_id’). The fields that are used to determine where to route the data and commands cannot be empty (i.e. null) or updated during the life of the row.

Looking at the data life-cycle, every piece of data must be born with a distribution key that it keeps for the course of its entire life. Looking back at our example, an ‘order’ is always born with a ‘user_id’ that never changes. All ‘items’ are linked to a specific ‘parent order’, so they never migrate between ‘orders’, the ‘parent order’ never changes for existing ‘items’, leaving no possibility for them to be ‘orphans’.

It is not enough to simply have the shard key in all tables; it needs to be populated, as part of the data in the table, as well. A row can be inserted into a table, updated many times and deleted. It is vital to insert every table into the database with an updated shard key. Returning to our example, it is imperative that the shard key is included in every insert into the database (i.e. not null). Nonetheless, if a row is inserted into the database with a ‘null’ shard key, it cannot be placed into the distributed database.

New Applications: Design for Scale from the Start

If you’re building a new app that is web-facing, mobile or social, today you have to design anticipating millions of users, high-transaction rates and ever larger data volumes.

The same data distribution principles applied to existing applications should also be applied to new applications. Data is stored and accessed together on the same database, whether it is for “reads” or “writes”.

When designing a data distribution policy, the distribution key should be selected according to how the data will be distributed. You can then denormalize, adding the distribution key to every table. Or, better yet, you can distribute by understanding the link between the tables within each shard from the beginning of the design process.

For example, when looking at an ‘order’ table, it should be clear how it is linked to the ‘user’ table, as well as the ‘order item’ table. It is in this initial phase that either cascading key lookup or denormalization should be carried out, identifying which data may still be missing from the equation. Therefore, when designing the database, ask yourself about the life-cycle of the rows of your data. Were they born with a populated distribution key? Designing your application in a way that makes sure this is taken care of avoids the unpleasant situations of null distribution keys.

When developing an application from scratch, analysis tools are not appropriate, as new applications do not have anything to track.   ScaleBase created a special guide “Building a New Application with Massive Database Scalability – Getting Started with ScaleBase” The goal of this document is to demonstrate how to build a new application that plans for massive database scalability right from the start.  It outlines the high-level steps involved and provides a walkthrough of how to create a simple, straightforward database data distribution policy. As an aid, it uses a sample application, called ‘Blog System’, mimicking today’s typical modern web scale applications. It frames the steps involved in building the application, including activities such as uploading files, quick and frequent database writes, as well as performing speedy database queries.

If you want to add linear scalability to an existing MySQL application you can leverage ScaleBase’s free online service Analysis Genie. It will help define the best data distribution policy tailored to your unique application is based on a guided analysis of the nature of your data, data relationships and the functional use of your data.

If you are building a new app that is web-facing, or mobile, or social, you have to design anticipating millions of users, high-transaction rates and ever larger data volumes. This guide will help you get started and on your way to a more efficiently distributed database.

Have you been exploring distributed databases?

What are some challenges with distributed relational databases that you would like me to focus on in future blog posts?

Reference: Pinal Dave (http://blog.SQLAuthority.com)

About these ads

SQL SERVER – How to Check Snapshot Isolation State of Database

It is very easy to know the snapshot Isolation State of Database and here is the quick script for the same.

SELECT name
, s.snapshot_isolation_state
, snapshot_isolation_state_desc
, is_read_committed_snapshot_on
FROM sys.databases s

Upon running above code it will return the results describing the status of the isolation level for your database. Here is the screenshot which describes the same.

Just on a side note, remember that READ COMMITTED SNAPSHOT does optimistic reads and pessimistic writes. Whereas, SNAPSHOT does optimistic reads and optimistic writes. It is recommended that you go for READ COMMITTED SNAPSHOT for most of your application where you want to implement row versioning. Microsoft has a detailed article on this subject over here.

Reference: Pinal Dave (http://blog.sqlauthority.com)

MySQL – List User Defined Tables – Two Methods

There are different methods to list out user defined tables in MySQL. We will see two methods in this post.

Let us create the following tables in the database named Test

Create a database named Test

CREATE DATABASE Test;

Create the tables named items and sales in the database Test.

USE TEST;
CREATE TABLE items(item_id INT, item_description VARCHAR(100));
CREATE TABLE sales(sales_id INT auto_increment KEY,item_id INT, sales_date DATETIME, sales_amount DECIMAL(12,2));

Now to list out all user defined tables, use any one of the following methods

Method 1: Use SHOW TABLES command

SHOW TABLES;

If you execute the above statement, it will list out all the user defined tables in the database in which it is run (in this example the database name is Test)

The result is

Tables_in_test
 --------------
 items
 sales

You can also use database name along with this

SHOW TABLES FROM Test;

Which will give you the same result

Method 2: Use INFORMATION_SCHEMA view

You can make use of ANSI standard INFORMATION_SCHEMA.TABLES
view to do the same shown as below

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA='Test';

The result is

Tables_in_test
 --------------
 items
 sales

This is a very small trick but works great when you want to know what are different tables exist in MySQL Database.

Reference: Pinal Dave (http://blog.sqlauthority.com)

MySQL – How to do Straight Join in MySQL?

In an earlier post, we have learnt a new type of join called NATURAL JOIN supported in MySQL. In this post will see another type of join called STRAIGHT_JOIN.

STRAIGHT_JOIN is actually an INNER JOIN but it forces the order in which data should be processed.

As per MySQL’s documentation at, the definition is given below

“STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.”

Let us create the following data

CREATE TABLE items(item_id INT, item_description VARCHAR(100));
CREATE TABLE sales(sales_id INT auto_increment KEY,item_id INT, sales_date DATETIME, sales_amount DECIMAL(12,2));
INSERT INTO items VALUES (1,'Television');
INSERT INTO items VALUES (2,'Mobile');
INSERT INTO items VALUES (3,'laptop');
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (1,'2014-01-01',1200);
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (2,'2014-01-02',200);
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (3,'2014-01-09',1700);
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (3,'2014-01-29',1700);
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (3,'2014-02-11',1700);
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (1,'2014-02-16',1200);
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (2,'2014-02-16',200);
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (2,'2014-02-20',200);
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (2,'2014-02-20',200);
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (2,'2014-02-22',200);
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (3,'2014-02-24',1700);
INSERT INTO sales(item_id,sales_date,sales_amount) VALUES (1,'2014-02-24',1200);

STRAIGHT_JOIN can be used in two methods

Method 1 Join tables using STRAIGHT_JOIN

SELECT t1.item_description,t2.sales_id,t2.sales_date,t2.sales_amount
FROM sales AS t2 STRAIGHT_JOIN items AS t1  ON t1.item_id=t2.item_id;

Method 2 Use INNER JOIN to join tables and use STRAIGHT_JOIN in the SELECT clause itself

SELECT STRAIGHT_JOIN t1.item_description,t2.sales_id,t2.sales_date,t2.sales_amount
FROM sales AS t2 INNER JOIN items AS t1  ON t1.item_id=t2.item_id;

Both the queries produce the following result

item_description sales_id sales_date sales_amount
 Television 1 1/1/2014 0:00 1200
 Mobile 2 1/2/2014 0:00 200
 laptop 3 1/9/2014 0:00 1700
 laptop 4 1/29/2014 0:00 1700
 laptop 5 2/11/2014 0:00 1700
 Television 6 2/16/2014 0:00 1200
 Mobile 7 2/16/2014 0:00 200
 Mobile 8 2/20/2014 0:00 200
 Mobile 9 2/20/2014 0:00 200
 Mobile 10 2/22/2014 0:00 200
 laptop 11 2/24/2014 0:00 1700
 Television 12 2/24/2014 0:00 1200

However the real test is when we execute the STRAIGHT_JOIN syntax with EXPLAIN command before SELECT statement and observe the order of table joined.

Method STRAIGHT JOIN

EXPLAIN SELECT STRAIGHT_JOIN t1.item_description,t2.sales_id,t2.sales_date,t2.sales_amount
FROM sales AS t2 INNER JOIN items AS t1 ON t1.item_id=t2.item_id;

Now let us see the result where we observe the order of tables used in Join.

You will notice that Sales Table which is aliased as t2 is first used in the result set.

Method INNER JOIN (without Straight Join)

EXPLAIN SELECT t1.item_description,t2.sales_id,t2.sales_date,t2.sales_amount
FROM sales AS t2 INNER JOIN items AS t1 ON t1.item_id=t2.item_id;

Now let us see the result where we observe the order of tables used in Join.

You will notice that Sales Table which is aliased as t2 is second used in the result set. That means MySQL have used internal optimizer and re-ordered the table order for maximum performance.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQLAuthority News – Learn MySQL Indexing in 99 Minutes – MySQL Indexing for Performance – Video Course

Every year around September 1st, I have decided to do something cool. This is because September 1st is my daughter Shaivi’s birthday. In previous years, I have released my new books as well new courses on this day.  This year I have done something similar for her. I like to work hard and dedicate my efforts to my family as they are the one who provides me unconditional support to do various activities for the community.

Journey to Learn MySQL Indexing in 99 Minutes

Indexes are considered to be sure solution for Performance tuning but it has two sides of the story. A good index can improve performance of your query and bad index can degrade the performance of the query. I have build a course around MySQL indexing where I explain how to create indexes which improves the performances of the MySQL Query by many folds. Indexing is such a subject that I can talk about this subject for many days and still I will have more material to share. However, I had decided to build a course which is very sharp and to the point. I wanted to make sure that with the help of this course one can learn all the major points of MySQL indexes in very short period of the time.

While I was building this course – I had put a threshold that I will not build a very long boring course. I wanted to make sure that I build a crisp course which addresses the need of the hour and teach the most efficient methods to improve performance of the query. It took me more than 3 weeks to build all the material I wanted to cover in this course. Once I have build the material the real challenge was on, I wanted to make sure that I build a course which is basic enough for a novice to understand but advanced enough that an expert can learn various tricks as well. I wanted to balance the course for every level as well as I wanted to make sure that the pace of the course is good enough for everyone to keep up with it.

Finally, after long hours and many days of work, I finalized the content of the course which just is the right fit for everybody – a novice who does not know anything about index as well an expert who already have a clear idea about the index.  I had pretty much mapped out every single minute of the course with the demo and slide. As per my estimate the course should have not gone over 100 minutes. When I was done with the course, I was extremely delighted as the entire course was recorded in 99 minutes – YES, only 99 minutes. I have previously recorded many courses but this course was built with perfection in the seconds of the time.

You can learn MySQL Performance Tuning with the help of Indexing in 99 minutes.

Course Content

Here is the course outline which I have built to learn MySQL Indexing. You can watch this course by creating free trial at Pluralsight. If you do not see the play button to the watch the course, please login to the Pluralsight with your valid credentials.

  • MySQL Indexing for Performance
    • Introduction
  • Types of Indexes
    • Introduction
    • InnoDB vs MyISAM
    • B-Tree Index
    • Advantages of B-Tree Index
    • Clustered Index
    • Advantages of Clustered Index
    • Disadvantages of Clustered Index
    • Secondary Index
    • Building Clustered B-Tree
    • Building Secondary Index B-Tree
    • InnoDB Clustered Table
    • Hash Index
    • Limitation of Hash Index
    • Adaptive Hash Index
    • Building Hash Index
    • Other Indexes
    • Summary in Sixty Seconds
  • Indexing Strategies for High Performance
    • Introduction
    • Effectiveness of Index
    • Demo: List Indexes
    • Demo: Basics of Indexes
    • Demo: Order of Columns in Index
    • Demo: Optimize OR Conditions
    • Demo: Optimize OR Conditions – Advanced
    • Demo: Optimize AND Conditions
    • Demo: Optimize AND Conditions – Advanced
    • Demo: Cover Index
    • Demo: Clustered Index
    • Demo: Index Hints
    • Summary in Sixty Seconds
  • Index Maintenance
    • Introduction
    • MySQL Query Optimizer
    • Statistics
    • InnoDB Statistics
    • Data and Index Fragmentation
    • Index Maintenance
    • Other Index Maintenance Tips
    • Summary in Sixty Seconds
  • Checklists
    • Checklists

Watch Video Course

Click here to watch the MySQL Indexing for Performance video course.

As an author of this course, I have few free codes to watch this course. Please leave a comment to watch this course and I will send a few of the readers a free card to watch this course. You can also request the free code on my Facebook page as I am very active there.

Reference: Pinal Dave (http://blog.sqlauthority.com)

MySQL – Learning MySQL Online in 6 Hours – MySQL Fundamentals in 320 Minutes

MySQL is one of the most popular database language and I have been recently working with it a lot. Data have no barrier and every database have their own place. I have been working with MySQL for quite a while and just like SQL Server, I often find lots of people asking me if I have a tutorial which can teach them MySQL from the beginning. Here is the good news, I have written two different courses on MySQL Fundamentals, which is available online.

The reason for writing two different courses was to keep the learning simple. Both of the courses are absolutely connected with other but designed if you watch either of the course independently you can watch them and learn without dependencies. However, if you ask me, I will suggest that you watch MySQL Fundamentals Part 1 course following with MySQL Fundamentals Part 2 course.

Let us quickly explore outline of MySQL courses.

MySQL Fundamental – 1 (157 minutes)

MySQL is a popular choice of database for use in web applications, and is a central component of the widely used LAMP open source web application software stack. This course covers the fundamentals of MySQL, including how to install MySQL as well as written basic data retrieval and data modification queries.

  • Introduction (duration 00:02:12)
  • Installations and GUI Tools (duration 00:13:51)
  • Fundamentals of RDBMS and Database Designs (duration 00:16:13)
  • Introduction MYSQL Workbench (duration 00:31:51)
  • Data Retrieval Techniques (duration 01:11:13)
  • Data Modification Techniques (duration 00:20:41)
  • Summary and Resources (duration 00:01:31)

MySQL Fundamental – 2 (163 minutes)

MySQL is a popular choice of database for use in web applications, and is a central component of the widely used LAMP open source web application software stack. In this course, which is part 2 of the Fundamentals of MySQL series, we explore more advanced topics such as stored procedures & user-defined functions, subqueries & joins, views and events & triggers.

  • Introduction (duration 00:02:09)
  • Joins, Unions and Subqueries (duration 01:03:56)
  • MySQL Functions (duration 00:36:55)
  • MySQL Views (duration 00:19:19)
  • Stored Procedures and Stored Functions (duration 00:25:23)
  • Triggers and Events (duration 00:13:41)
  • Summary and Resources (duration 00:02:18)

Note if you click on the link above and you do not see the play button to watch the course, you will have to login to the system and watch the course.

I would like to throw a challenge to you – Can you watch both of the courses in a single day?

If yes, once you are done watching the course on your Pluralsight Profile Page (here is my profile http://pluralsight.com/training/users/pinal-dave) you will get following badges. If you have already watched MySQL Fundamental Part 1, you can qualify by just watching MySQL Fundamental Part 2.

Just send me the link to your profile and I will publish your name on this blog. For the first five people who send me email at Pinal at sqlauthority.com; I might have something cool as a giveaway as well.

Watch the teaser of MySQL course.

Reference: Pinal Dave (http://blog.sqlauthority.com