SQL SERVER – In-Memory Databases and Caching Wars

The In-Memory Database and In-Memory Caching market is heating up and we (the users) are the real winners. Exciting!!!

  • SAP, the giant application vendor, but a database newbie, is pushing to the market its SAP HANA, a standalone in-memory database (in an appliance box).
  • Last year IBM released a set of acceleration technologies, code-named BLU (a development code name that stood for “big data, lightning fast, ultra-easy”), adding in-memory capabilities to DB2 and better integration of solidDB caching engine with DB2.
  • This April (2014) Microsoft released its SQL Server 2014 with the new In-Memory OLTP engine embedded inside. Adding an In-Memory data engine right inside the most used database engine, the SQL Server, opens up many great possibilities for us, the engineers, building new applications and software solutions.
  • This month Oracle CEO Larry unveiled Oracle In-Memory database engine to be available next month. This comes as a direct fight to IBM, Microsoft and SAP in the burgeoning in-memory database market with technologies that can deliver dramatic performance boosts, high-scale online transactions or complex data processing.

The new in-memory database engines enable users to build new apps with a new scale of speed and processing capabilities. Hurrah!!

But the new in-memory database engines come with a price: Long list of data limitations, syntax limitations, use-case limitations, migration complications etc.

Another fast growing approach is adding a Smart Data Caching layer, where SafePeak (www.SafePeak.com) appears to be the visionary leader. SafePeak’s automated caching solution for SQL Server based applications enables easy deployment of a smart caching layer between the application and the database. SafePeak performs automated dynamic caching SQL queries and stored procedures results in-memory, covering immediately all queries of the application. Its key features that make it an interesting solution to explore when performance is discussed are:

1) Automation and ease of use;
2) No code changes;
3) Cache awareness for any transactions and data changes;
4) Wide range of caching options and cache eviction strategies.

The SQL Analytics engine embedded in SafePeak is an excellent bonus, working all the time and collecting information about every query. It’s a bit like the well-known SQL Profiler, but without the performance drop and with conversion of the raw not-really-readable data into advanced analytics.

When it comes to applicability to existing production applications, SafePeak’s strategy is to be easy to use for already-in-production operational read-write (but read-intensive) applications, and to be optimized for virtual and cloud environments. For in-production systems, this gives SafePeak an interesting advantage over the in-memory technologies of database giants.

 

Any way to look at – these in-memory developments are exciting for us
Comparing main in-memory options for the SQL Server applications, a question I think worth exploring is this:

When would SQL Server 2014 In-Memory OLTP fit your needs better than SafePeak’s Caching, and vice versa?

The answer depends of course on the user needs and the application-database state (Greenfield, Production, or Commercial off the shelf application).

SafePeak caching focuses on “READs”, especially REPEATITIVE READS, therefore it fits well for classic multi-user applications.

If your need is mostly about speeding up the WRITES or COMPLEX PROCESSING long-running queries, then having  a full table (or tables) In-Memory (as in SQL Server 2014 IN-Memory OLTP) sounds a better option.

However, the answer of applicability depending on the state of the application/database is very interesting too. I think it splits between 3 use cases:

1) In “greenfield” applications in the development cycle / process
The SQL Server 2014 In-Memory OLTP is a great option to explore. Having the knowledge how the 2014 In-Memory Tables and Stored Procedures works and their various limitations – gives the development managers tools to plan and developer the new application based on available features. In such cases several few important tables (and procs accessing them) can become in-memory. The rest will be usual.

2) In live production homegrown applications needing to speed-up performance without major redesign
SQL Server 2014 In-Memory OTLP may be a complex option to apply, due to its many current limitations of the solution and the requirement to redesign and reprogram parts of your application. And then there is the migration process… For this reason, SafePeak Cache may be a better choice.

3) In commercial 3rd-party / off-the-shelf applications
Unfortunately, SQL Server 2014 In-Memory OTLP requires new code and design changes, so in most cases, the user cannot apply the new in-memory feature. SafePeak works without any need to touch the application, so SafePeak wins here.

Here’s a quick comparison of advantages and disadvantages of Microsoft’s In-Memory OLTP option with SafePeak’s caching:

1) SQL 2014 works very well for in write intensive processes and large data processing. SafePeak main gain is on repetitive-reads (queries and procs).

2) SQL 2014 has your full table in memory. Every query will be fast, even the first time it is called.
SafePeak has a module called Cache Warm-up, enabling to “pre-fetch data into cache”.

3) SQL 2014 In-Memory OLTP is fully embedded inside the SQL Server, so no extra layer is needed.

SafePeak’s advantages:

1) SafePeak #1 advantage is time to deployment. No code changes, takes only a few hours to deploy and configure;

2) SafePeak immediately covers 100% of the application. Not only tables designed to be in-memory;

3) No SQL syntax limitations. All tables, all existing code, any SQL syntax, any stored procedure can be cached. Most automatically, rest with easy configuration;

Bottom-line, only a customer can say what technology will best fit his/her situation. But definitely the In-Memory Competition will continue to benefit the industry and mostly us – the DBA’s, programmers and application managers!

Interested to trial SQL Server 2014 In-Memory OTLP?

Download SQL Server 2014 from the Microsoft site.
And see my post “Introduction and Quick-start guide to SQL Server 2014 In-Memory OLTP

Interested to trial SafePeak?

SafePeak is available for free trial: www.SafePeak.com/Download
The guys there love helping users with free training and setup assistance, allowing users to get results quicker and better – and allowing the company to learn real customers’ needs and challenges (if you ask me, the right way to build a product and a company).

What In-Memory category fits you? Let us know. 

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

SQL SERVER – Keeping MSDB System Database Lean and Fit – Notes from the Field #033

[Notes from Pinal]: I am working with SQL Server for almost 10 years. I have seen the most ignored databases on any database server is system database. We hardly pay attention to them. Just for the sake of it – try to run queries in your master database and see how many user databases already exists there. I am sure unless and until you have strict policies in the place, there will be a handful of the tables available there. Similarly try to ask DBA question about what is the exact importance of the MSDB database and you will find it that there are so many of the DBA without its knowledge. I often talk about this with my friend Tim. I asked him if he can help us out  with the tricks to keep MSDB system database at its optimal settings.

Linchpin People are database coaches and wellness experts for a data driven world. In this 33rd episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple word how to keep MSDB system database at optimal settings. 


When doing a review of a client’s database server, one of the checks I perform is checking to see how much backup history is being kept within MSDB. SQL Server does a fantastic job logging all the backups and restores we perform and depending how your routine is configured you are probably only keeping enough backup files to meet your SLA’s. What most DBA’s or Admins don’t think about is all that backup and restore history being retained with MSDB.

The tables within msdb that hold this history include:

  • restorefile
  • restorefilegroup
  • restorehistory
  • backupfile
  • backupfilegroup
  • backupmediafamily
  • backupmediaset
  • backupset

Microsoft SQL Server has a system stored procedure that helps us maintain the history however it is not scheduled to run by default. We have to create a process to run the stored procedure on a routine basis. You will need to know how much history is enough to keep for your organization. I have seen this range from 90 days to 14 days.

The stored procedure is sp_delete_backupshistory within the MSDB database.

This stored procedure takes a single parameter of date. This date is the cutoff date of history to keep. Any history past the date provided will be deleted.

Over the years I have found MSDB ranging from several megabytes to nearly 20 GB.  I recall one client who had a database server with hundreds of databases that were being log shipped to a secondary with 15 minute log backup and the system had been in place for many years. MSDB had grown to almost 20 GB. The client had complained that if they ever tried to restore a database within the GUI that it would lock up SSMS.

A simple script to execute to purge this history older than 30 days is below.

USE msdb
GO
DECLARE @CutOffDate DATETIME
SET
@CutOffDate = CONVERT(VARCHAR(10), DATEADD(dd, -30,GETDATE()), 101)
EXEC sp_delete_backuphistory @CutOffDate
GO

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – SQL Server High Availability Options – Notes from the Field #032

[Notes from Pinal]: When it is about High Availability or Disaster Recovery, I often see people getting confused. There are so many options available that when the user has to select what is the most optimal solution for their organization they are often confused. Most of the people even know the salient features of various options, but when they have to figure out one single option to use they are often not sure which option to use. I like to give ask my dear friend time all these kinds of complicated questions. He has a skill to make a complex subject very simple and easy to understand.

Linchpin People are database coaches and wellness experts for a data driven world. In this 26th episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple words the best High Availability Option for your SQL Server. 


Working with SQL Server a common challenge we are faced with is providing the maximum uptime possible.  To meet these demands we have to design a solution to provide High Availability (HA). Microsoft SQL Server depending on your edition provides you with several options.  This could be database mirroring, log shipping, failover clusters, availability groups or replication.

Each possible solution comes with pro’s and con’s.  Not anyone one solution fits all scenarios so understanding which solution meets which need is important.  As with anything IT related, you need to fully understand your requirements before trying to solution the problem.  When it comes to building an HA solution, you need to understand the risk your organization needs to mitigate the most.

I have found that most are concerned about hardware failure and OS failures. Other common concerns are data corruption or storage issues.  For data corruption or storage issues you can mitigate those concerns by having a second copy of the databases. That can be accomplished with database mirroring, log shipping, replication or availability groups with a secondary replica.  Failover clustering and virtualization with shared storage do not provide redundancy of the data.

I recently created a chart outlining some pros and cons of each of the technologies that I posted on my blog.

I like to use this chart to help illustrate how each technology provides a certain number of benefits.  Each of these solutions carries with it some level of cost and complexity.  As a database professional we should all be familiar with these technologies so we can make the best possible choice for our organization.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Iridium I/O – SQL Server Deduplication that Shrinks Databases and Improves Performance

Database performance is a common problem for SQL Server DBA’s.  It seems like we spend more time on performance than just about anything else.  In many cases, we use scripts or tools that point out performance bottlenecks but we don’t have any way to fix them.  For example, what do you do when you need to speed up a query that is already tuned as well as possible?  Or what do you do when you aren’t allowed to make changes for a database supporting a purchased application?

Iridium I/O for SQL Server was originally built at Confio software (makers of Ignite) because DBA’s kept asking for a way to actually fix performance instead of just pointing out performance problems. The technology is certified by Microsoft and was so promising that it was spun out into a separate company that is now run by the Confio Founder/CEO and technology management team.

Iridium uses deduplication technology to both shrink the databases as well as boost IO performance.  It is intriguing to see it work.  It will deduplicate a live database as it is running transactions.  You can watch the database get smaller while user queries are running.

Iridium is a simple tool to use. After installing the software, you click an “Analyze” button which will spend a minute or two on each database and estimate both your storage and performance savings.  Next, you click an “Activate” button to turn on Iridium I/O for your selected databases.  You don’t need to reboot the operating system or restart the database during any part of the process.

As part of my test, I also wanted to see if there would be an impact on my databases when Iridium was removed.  The ‘revert’ process (bringing the files back to their SQL Server native format) was executed by a simple click of a button, and completed while the databases were available for normal processing.

I was impressed and enjoyed playing with the software and encourage all of you to try it out.  Here is the link to the website to download Iridium for free. .

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Faster SQL Server Databases and Applications – Power and Control with SafePeak Caching Options

Update: This blog post is written based on the SafePeak, which is available for free download.

Today, I’d like to examine more closely one of my preferred technologies for accelerating SQL Server databases, SafePeak. Safepeak’s software provides a variety of advanced data caching options, techniques and tools to accelerate the performance and scalability of SQL Server databases and applications.

I’d like to look more closely at some of these options, as some of these capabilities could help you address lagging database and performance on your systems.

To better understand the available options, it is best to start by understanding the difference between the usual “Basic Caching” vs. SafePeak’s “Dynamic Caching”.

Basic Caching

Basic Caching (or the stale and static cache) is an ability to put the results from a query into cache for a certain period of time. It is based on TTL, or Time-to-live, and is designed to stay in cache no matter what happens to the data. For example, although the actual data can be modified due to DML commands (update/insert/delete), the cache will still hold the same obsolete query data. Meaning that with the Basic Caching is really static / stale cache.  As you can tell, this approach has its limitations.

Dynamic Caching

Dynamic Caching (or the non-stale cache) is an ability to put the results from a query into cache while maintaining the cache transaction awareness looking for possible data modifications. The modifications can come as a result of:

  • DML commands (update/insert/delete),
  • indirect modifications due to triggers on other tables,
  • executions of stored procedures with internal DML commands
  • complex cases of stored procedures with multiple levels of internal stored procedures logic.

When data modification commands arrive, the caching system identifies the related cache items and evicts them from cache immediately. In the dynamic caching option the TTL setting still exists, although its importance is reduced, since the main factor for cache invalidation (or cache eviction) become the actual data updates commands.

Now that we have a basic understanding of the differences between “basic” and “dynamic” caching, let’s dive in deeper.

SafePeak: A comprehensive and versatile caching platform

SafePeak comes with a wide range of caching options. Some of SafePeak’s caching options are automated, while others require manual configuration. Together they provide a complete solution for IT and Data managers to reach excellent performance acceleration and application scalability for  a wide range of business cases and applications.

  • Automated caching of SQL Queries: Fully/semi-automated caching of all “read” SQL queries, containing any types of data, including Blobs, XMLs, Texts as well as all other standard data types. SafePeak automatically analyzes the incoming queries, categorizes them into SQL Patterns, identifying directly and indirectly accessed tables, views, functions and stored procedures;
  • Automated caching of Stored Procedures: Fully or semi-automated caching of all read” stored procedures, including procedures with complex sub-procedure logic as well as procedures with complex dynamic SQL code. All procedures are analyzed in advance by SafePeak’s  Metadata-Learning process, their SQL schemas are parsed – resulting with a full understanding of the underlying code, objects dependencies (tables, views, functions, sub-procedures) enabling automated or semi-automated (manually review and activate by a mouse-click) cache activation, with full understanding of the transaction logic for cache real-time invalidation;
  • Transaction aware cache: Automated cache awareness for SQL transactions (SQL and in-procs);
  • Dynamic SQL Caching: Procedures with dynamic SQL are pre-parsed, enabling easy cache configuration, eliminating SQL Server load for parsing time and delivering high response time value even in most complicated use-cases;
  • Fully Automated Caching: SQL Patterns (including SQL queries and stored procedures) that are categorized by SafePeak as “read and deterministic” are automatically activated for caching;
  • Semi-Automated Caching: SQL Patterns categorized as “Read and Non deterministic” are patterns of SQL queries and stored procedures that contain reference to non-deterministic functions, like getdate(). Such SQL Patterns are reviewed by the SafePeak administrator and in usually most of them are activated manually for caching (point and click activation);
  • Fully Dynamic Caching: Automated detection of all dependent tables in each SQL Pattern, with automated real-time eviction of the relevant cache items in the event of “write” commands (a DML or a stored procedure) to one of relevant tables. A default setting;
  • Semi Dynamic Caching: A manual cache configuration option enabling reducing the sensitivity of specific SQL Patterns to “write” commands to certain tables/views. An optimization technique relevant for cases when the query data is either known to be static (like archive order details), or when the application sensitivity to fresh data is not critical and can be stale for short period of time (gaining better performance and reduced load);
  • Scheduled Cache Eviction: A manual cache configuration option enabling scheduling SQL Pattern cache eviction based on certain time(s) during a day. A very useful optimization technique when (for example) certain SQL Patterns can be cached but are time sensitive. Example: “select customers that today is their birthday”, an SQL with getdate() function, which can and should be cached, but the data stays relevant only until 00:00 (midnight);
  • Parsing Exceptions Management: Stored procedures that were not fully parsed by SafePeak (due to too complex dynamic SQL or unfamiliar syntax), are signed as “Dynamic Objects” with highest transaction safety settings (such as: Full global cache eviction, DDL Check = lock cache and check for schema changes, and more). The SafePeak solution points the user to the Dynamic Objects that are important for cache effectiveness, provides easy configuration interface, allowing you to improve cache hits and reduce cache global evictions. Usually this is the first configuration in a deployment;
  • Overriding Settings of Stored Procedures: Override the settings of stored procedures (or other object types) for cache optimization. For example, in case a stored procedure SP1 has an “insert” into table T1, it will not be allowed to be cached. However, it is possible that T1 is just a “logging or instrumentation” table left by developers. By overriding the settings a user can allow caching of the problematic stored procedure;
  • Advanced Cache Warm-Up: Creating an XML-based list of queries and stored procedure (with lists of parameters) for periodically automated pre-fetching and caching. An advanced tool allowing you to handle more rare but very performance sensitive queries pre-fetch them into cache allowing high performance for users’ data access;
  • Configuration Driven by Deep SQL Analytics: All SQL queries are continuously logged and analyzed, providing users with deep SQL Analytics and Performance Monitoring. Reduce troubleshooting from days to minutes with database objects and SQL Patterns heat-map. The performance driven configuration helps you to focus on the most important settings that bring you the highest performance gains. Use of SafePeak SQL Analytics allows continuous performance monitoring and analysis, easy identification of bottlenecks of both real-time and historical data;
  • Cloud Ready: Available for instant deployment on Amazon Web Services (AWS).

As you can see, there are many options to configure SafePeak’s SQL Server database and application acceleration caching technology to best fit a lot of situations. If you’re not familiar with their technology, they offer free-trial software you can download that comes with a free “help session” to help get you started.

You can access the free trial here. Also, SafePeak is available to use on Amazon Cloud.

Reference: Pinal Dave (http://blog.sqlauthority.com)

MySQL – Scalability on Amazon RDS: Scale out to multiple RDS instances

Today, I’d like to discuss getting better MySQL scalability on Amazon RDS.

The question of the day: “What can you do when a MySQL database needs to scale write-intensive workloads beyond the capabilities of the largest available machine on Amazon RDS?”

Let’s take a look.

In a typical EC2/RDS set-up, users connect to app servers from their mobile devices and tablets, computers, browsers, etc.  Then app servers connect to an RDS instance (web/cloud services) and in some cases they might leverage some read-only replicas.

 

Figure 1. A typical RDS instance is a single-instance database, with read replicas.  This is not very good at handling high write-based throughput.

As your application becomes more popular you can expect an increasing number of users, more transactions, and more accumulated data.  User interactions can become more challenging as the application adds more sophisticated capabilities. The result of all this positive activity: your MySQL database will inevitably begin to experience scalability pressures.

What can you do?

Broadly speaking, there are four options available to improve MySQL scalability on RDS.

1. Larger RDS Instances – If you’re not already using the maximum available RDS instance, you can always scale up – to larger hardware.  Bigger CPUs, more compute power, more memory et cetera. But the largest available RDS instance is still limited.  And they get expensive.

“High-Memory Quadruple Extra Large DB Instance”:

  • 68 GB of memory
  • 26 ECUs (8 virtual cores with 3.25 ECUs each)
  • 64-bit platform
  • High I/O Capacity
  • Provisioned IOPS Optimized: 1000Mbps

2. Provisioned IOPs – You can get provisioned IOPs and higher throughput on the I/O level.

However, there is a hard limit with a maximum instance size and maximum number of provisioned IOPs you can buy from Amazon and you simply cannot scale beyond these hardware specifications.

3. Leverage Read Replicas – If your application permits, you can leverage read replicas to offload some reads from the master databases. But there are a limited number of replicas you can utilize and Amazon generally requires some modifications to your existing application.

And read-replicas don’t help with write-intensive applications.

4. Multiple Database Instances – Amazon offers a fourth option:

You can implement partitioning,thereby spreading your data across multiple database Instances” (Link)

However, Amazon does not offer any guidance or facilities to help you with this. “Multiple database instances” is not an RDS feature.  And Amazon doesn’t explain how to implement this idea.

In fact, when asked, this is the response on an Amazon forum:

Q: Is there any documents that describe the partition DB across multiple RDS?
I need to use DB with more 1TB but exist a limitation during the create process, but I read in the any FAQ that you need to partition database, but I don’t find any documents that describe it.

A: “DB partitioning/sharding is not an official feature of Amazon RDS or MySQL, but a technique to scale out database by using multiple database instances. The appropriate way to split data depends on the characteristics of the application or data set. Therefore, there is no concrete and specific guidance.”

So now what?

The answer is to scale out with ScaleBase.

Amazon RDS with ScaleBase: What you get – MySQL Scalability!

ScaleBase is specifically designed to scale out a single MySQL RDS instance into multiple MySQL instances.

Critically, this is accomplished with no changes to your application code.  Your application continues to “see” one database.   ScaleBase does all the work of managing and enforcing an optimized data distribution policy to create multiple MySQL instances.

With ScaleBase, data distribution, transactions, concurrency control, and two-phase commit are all 100% transparent and 100% ACID-compliant, so applications, services and tooling continue to interact with your distributed RDS as if it were a single MySQL instance.

The result: now you can cost-effectively leverage multiple MySQL RDS instance to scale out write-intensive workloads to an unlimited number of users, transactions, and data.

Amazon RDS with ScaleBase: What you keep – Everything!

And how does this change your Amazon environment?

1. Keep your application, unchanged – There is no change your application development life-cycle at all.  You still use your existing development tools, frameworks and libraries.  Application quality assurance and testing cycles stay the same. And, critically, you stay with an ACID-compliant MySQL environment.

2. Keep your RDS value-added services – The value-added services that you rely on are all still available. Amazon will continue to handle database maintenance and updates for you. You can still leverage High Availability via Multi A-Z.  And, if it benefits youra application throughput, you can still use read replicas.

3. Keep your RDS administration – Finally the RDS monitoring and provisioning tools you rely on still work as they did before.

With your one large MySQL instance, now split into multiple instances, you can actually use less expensive, smallersmaller available RDS hardware and continue to see better database performance.

Conclusion

Amazon RDS is a tremendous service, but it doesn’t offer solutions to scale beyond a single MySQL instance. Larger RDS instances get more expensive.  And when you max-out on the available hardware, you’re stuck.  Amazon recommends scaling out your single instance into multiple instances for transaction-intensive apps, but offers no services or guidance to help you. This is where ScaleBase comes in to save the day.

It gives you a simple and effective way to create multiple MySQL RDS instances, while removing all the complexities typically caused by “DIY” sharding andwith no changes to your applications .

With ScaleBase you continue to leverage the AWS/RDS ecosystem: commodity hardware and value added services like read replicas, multi A-Z, maintenance/updates and administration with monitoring tools and provisioning.

SCALEBASE ON AMAZON

If you’re curious to try ScaleBase on Amazon, it can be found here – Download NOW.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – DMV to Identify Incremental Statistics – Performance improvements in SQL Server 2014 – Part 3

This is the third part of the series Incremental Statistics. Here is the index of the complete series.


In earlier two parts we have seen what is incremental statistics and its simple example. In this blog post we will be discussing about DMV, which will list all the statistics which are enabled for Incremental Updates.

SELECT  OBJECT_NAME(sys.stats.OBJECT_ID) AS TableName,
sys.columns.name AS ColumnName,
sys.stats.name AS StatisticsName
FROM   sys.stats
INNER JOIN sys.stats_columns ON sys.stats.OBJECT_ID = sys.stats_columns.OBJECT_ID
AND sys.stats.stats_id = sys.stats_columns.stats_id
INNER JOIN sys.columns ON sys.stats.OBJECT_ID = sys.columns.OBJECT_ID
AND sys.stats_columns.column_id = sys.columns.column_id
WHERE   sys.stats.is_incremental = 1

If you run above script in the example displayed, in part 1 and part 2 you will get resultset as following.

When you execute the above script, it will list all the statistics in your database which are enabled for Incremental Update. The script is very simple and effective. If you have any further improved script, I request you to post in the comment section and I will post that on blog with due credit.

Reference: Pinal Dave (http://blog.sqlauthority.com)