SQL SERVER – Performance: Do-it-Yourself Caching with Memcached vs. Automated Caching with SafePeak

We’ve all been reading a lot of articles that suggest adding a caching layer in front of a database is more efficient than hitting the database directly. Among different caching options there is Do-It-Yourself coding, using open source tools like Memcached, or using Velocity (part of the Microsoft AppFabric) or using Ehcache (for Java developers). Alternatively, you can now find automated caching solutions like SafePeak’s dynamic caching for SQL Server.

Why use a caching solution?

Let’s quickly recap the many reasons to use data caching with SQL Server:

  • To improve data access read/write performance
  • To reduce the number queries hitting the of database
  • To reduce database load (CPU, IO, Memory)
  • To improve application page load response time (the user’s actual experience)
  • To distribute repetitive query workloads to separate cache servers
  • To share the benefit of cached data on one server across all other servers.
  • To increase operational efficiency, by scaling more data and more users on smaller and fewer SQL Servers machines

Another benefit of caching is its impact on database scalability. SQL Server is expensive to scale with bigger hardware, which also can trigger increased license fees. Fast Flash storage systems, which can be helpful, are also very expensive. Whereas adding a new web/caching server can be much cheaper (and potentially more effective).

So, now that we know why we want to use caching with SQL SERVER, what’s the best way to go about it? Well, you’ll need to decide what’s best for you. Let’s look at three main SQL Server caching options:

  1. SQL Server Memory and IO caching
  2. Application data caching, a Do It Yourself approach using Memcached/similar APIs
  3. SafePeak automated SQL caching

By the way, I’d also like to learn from your experiences, so I look forward to your comments and thoughts on what I discuss here.

SQL Server: Doesn’t it already have memory cache?

To start with, we need to remember that SQL Server has its own memory cache for objects in the database. When data is retrieved, SQL Server maintains its cache and will (if necessary) pull the row from its memory and not hit the disk.

So if SQL Server has its own cache, what’s the benefit of data caching layers such as Memcached, SafePeak or similar?

The answer is this: SQL Server only caches:

  1. Query plans
  2. Pages from the database files.

SQL Server does NOT cache results from a query.

This is the important distinction.

For example, imagine if you have a complex query which uses some aggregation on a lot of data (e.g.: how many different countries we have in our customer database: SELECT DISTINCT Country from Customers GROUP BY country). SQL Server will scan the WHOLE customer table, but your result-set will only be a few entries long. When you reissue your query, SQL Server will reuse the query plan and will rescan the customer table, (and if you are lucky the pages are still in memory)

When you use application cache, you store your result-sets in Memcached RAM. Then reuse them over and over again without connecting to the database server, thus offloading workloads from your database server.

Do-It-Yourself application data caching using Memcached / similar APIs

Application data caching is quite easy to start. However, changing (or building) code for large applications with effective data caching is challenging for a few reasons:

  1. Correctly scoping required code changes;
  2. Delivering and testing actual code changes required to handle caching of all (or most) SQL queries;
  3. Preserving data integrity: how to invalidate all relevant cache items and to handle all kinds of data change events. And in real-life both READs / WRITEs can be complex involving stored-procedures, triggers, views, table joins etc.; and
  4. Maintaining the additional code layer.

One thing to keep in mind is that because DIY and Memcached approaches need to touch app code, they can’t be used to accelerate any off-the-shelf software that relies on SQL Server.

So, while it can be complicated, application performance benefits are always good, so let’s dive into what you would do using Memcached.

About Memcached data caching
Memcached is a big hash table: a key/value store that lives entirely in RAM on multiple servers. It allows you to use RAM from multiple servers as single memory caching space.

Basic data caching using Memcached

Below is a basic tutorial showing (via pseudocode) how you can get started with integrating Memcached into your application. If you’re an application developer, it isn’t something you just “turn on” and then your site goes faster. You have to pay attention. For every query you want to be cached, you have to put (set()) it into cache, get it from cache (get()) and (probably most complicated) make sure you keep the data in the cache correct by removing from cache (delete()) when the data is updated.

The following pseudocode of Memcached example is written in #Perl but can be done in same way in most other languages, including .net, java and phpetc (memcached clients)

Initializing a Memcached Client with a list of your pre-configured Memcached servers:

# perl example
my $memclient = Cache::Memcached->new({ servers => [ '10.0.0.10:11211', '10.0.0.11:11211' ]});

Wrapping an SQL Query
Memcached is famous for reducing load on SQL databases. Here is some pseudocode showing how to wrap a database query access with a memcached caching layer, by implementing check-in-cache (function get()) and place-into-cache (function set()):

# Define a query and use it as a key for the Memcached:
sql = "SELECT * FROM user WHERE user_id = ?"
key = 'SQL:' . user_id . ':' . md5sum(sql)
# We check if the value is 'defined' (or in cache), since '0' or 'FALSE' can be legitimate values!
if (defined result = memcli:get(key)) {
        return result
} else {
        # Query is not in Cache, Get query resultset from your database server and convert it to resultset array
        handler = run_sql(sql, user_id)
        rows_array = handler:turn_into_an_array

        # Cache it for five minutes
        memcli:set(key, rows_array, 5 * 60)
        return rows_array
}

Notice that the SQL query result-set was entered to cache with five-minute expiration time, as an example.

Stale data in cache and invalidation of cached items
Unless you actively invalidate the cached item, when a user (or your database) makes a change, it can take up to five minutes (or more, depending on cache item expiration time) for users to see the correct new data. This is a key issue when implementing a Do-It-Yourself caching strategy.

When a user comes along and edits data, you can keep the cache in sync in two main ways:

  1. You can update the item in cache; or
  2. Delete the old item.

Expiration
The most basic invalidation happens by defining the Expiration period. Even if you're actively deleting or overwriting cached data, you'll still want to have the cache expire occasionally.

 # Cache it for five minutes
  memcli:set(key, rows_array, 5 * 60)

Expiration times can be set from 0, meaning "never expire", to 30 days. Any time higher than 30 days is interpreted as a timestamp date.

delete()
The simplest method of invalidation is to simply delete an entry and have your website re-cache the data the next time it's fetched. For example, when a user updates her bio, you want her to see her latest info when she reloads the page:

 sql = "SELECT * FROM user WHERE user_id = ?"
 key = 'SQL:' . user_id . ':' . md5sum(sql)

 memcli:delete(key)

The next time this query will be requested it will be fetched from the database and repopulate the cache.

set()
A more efficient idea is to actively update your cache as your data changes. When our user updates her bio, take the bio object and move it into the cache via 'set'. You can pass the new data into the same routine that normally checks for data, or however you want to structure it.

Summary about DYI application data caching

Data caching is considered a key component of a high performance and scalable application architecture. Fast page load response time is a one of the key metrics for user satisfaction. Memcached and similar distributed caching technologies are used today in most (if not all) well-known big websites (Linkedin, Facebook others). The challenge is doing all the extra work custom DIY caching represents for a development team to implement, test, and maintain.

If you think I’m painting a dark picture for Do-It-Yourself, home-grown caching, you’re right. Why? Because nowadays there are alternatives to use a commercial smart caching layer. The cost to purchase third-party caching software is much more accurately defined, than the unknown cost of developing caching code yourself.

But, you need to decide for yourself if that’s the best option for your situation.

SafePeak’s “Automated Dynamic Caching”

SafePeak is pre-packaged software that is specifically designed to cache SQL Server queries. It automates many of the steps that developers would try to do manually with Memcached. And it doesn’t require any changes to application code. This makes it quick to deploy and useful to accelerate the performance of both 3rd-party applications (like SharePoint, CRM Dynamics) and your own custom applications.

In my opinion, this makes Safepeak’s approach much easier for software and database architects and developers to use: you can literally “plug-in” SafePeak to production SQL Server systems. All queries are analyzed and managed by SafePeak automatically: their logic and dependent objects (like tables, views, triggers, procedures, etc.) are identified and analyzed, write commands are monitored and cache invalidation applied in real-time.

And its app acceleration encompasses all of an app’s queries and procedures traffic immediately on deployment.

SafePeak’s self-learning algorithms manage most of configuration; the rest is done via a dashboard tool that combines cache management with real-time monitoring and statistics. By analyzing both the database schemas and the SQL traffic, SafePeak creates optimized caching patterns specific to your applications and improves them over time. SafePeak cache supports read-write database transactions and is dynamically kept in sync with any data changes as they happen.

Here’s what the SafePeak dashboard looks like (screenshot from SafePeak’s whitepaper):

What Next?

You have to decide what approach or technology is best for your situation.

An effective DYI caching strategy, using libraries like Memcached or Velocity, has some advantages (better granular cache invalidation control, ability to store additional types of data besides database row-sets and others). But it can be tricky to accurately scope the work needed, hard to code correctly and comes at a real cost of time and focus away from developing application capabilities.

SafePeak offers an easier way to accelerate SQL Server based applications, with all caching benefits without any app re-programming. In a way, SafePeak’s design reminds me of a combination of Memcached caching and an In-Memory database, but enhanced with machine learning intelligence.

Caching technology is designed to maximize operational efficiency of your existing infrastructure, so that you can scale more data and more workloads.

I like SafePeak’s “plug-and-play” approach as it gets results fastest and with minimum risk. If you’d like to check it out, they offer a free trial (www.safepeak.com/download) that includes one-on-one technical training & assistance support.

Now I’d like to hear from you. Have you deployed a SQL caching strategy? What have you done? How has it worked out?

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s