SQL SERVER – Keeping MSDB System Database Lean and Fit – Notes from the Field #033

[Notes from Pinal]: I am working with SQL Server for almost 10 years. I have seen the most ignored databases on any database server is system database. We hardly pay attention to them. Just for the sake of it – try to run queries in your master database and see how many user databases already exists there. I am sure unless and until you have strict policies in the place, there will be a handful of the tables available there. Similarly try to ask DBA question about what is the exact importance of the MSDB database and you will find it that there are so many of the DBA without its knowledge. I often talk about this with my friend Tim. I asked him if he can help us out  with the tricks to keep MSDB system database at its optimal settings.

Linchpin People are database coaches and wellness experts for a data driven world. In this 33rd episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple word how to keep MSDB system database at optimal settings. 


When doing a review of a client’s database server, one of the checks I perform is checking to see how much backup history is being kept within MSDB. SQL Server does a fantastic job logging all the backups and restores we perform and depending how your routine is configured you are probably only keeping enough backup files to meet your SLA’s. What most DBA’s or Admins don’t think about is all that backup and restore history being retained with MSDB.

The tables within msdb that hold this history include:

  • restorefile
  • restorefilegroup
  • restorehistory
  • backupfile
  • backupfilegroup
  • backupmediafamily
  • backupmediaset
  • backupset

Microsoft SQL Server has a system stored procedure that helps us maintain the history however it is not scheduled to run by default. We have to create a process to run the stored procedure on a routine basis. You will need to know how much history is enough to keep for your organization. I have seen this range from 90 days to 14 days.

The stored procedure is sp_delete_backupshistory within the MSDB database.

This stored procedure takes a single parameter of date. This date is the cutoff date of history to keep. Any history past the date provided will be deleted.

Over the years I have found MSDB ranging from several megabytes to nearly 20 GB.  I recall one client who had a database server with hundreds of databases that were being log shipped to a secondary with 15 minute log backup and the system had been in place for many years. MSDB had grown to almost 20 GB. The client had complained that if they ever tried to restore a database within the GUI that it would lock up SSMS.

A simple script to execute to purge this history older than 30 days is below.

USE msdb
GO
DECLARE @CutOffDate DATETIME
SET
@CutOffDate = CONVERT(VARCHAR(10), DATEADD(dd, -30,GETDATE()), 101)
EXEC sp_delete_backuphistory @CutOffDate
GO

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone.

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

SQL SERVER – SQL Server High Availability Options – Notes from the Field #032

[Notes from Pinal]: When it is about High Availability or Disaster Recovery, I often see people getting confused. There are so many options available that when the user has to select what is the most optimal solution for their organization they are often confused. Most of the people even know the salient features of various options, but when they have to figure out one single option to use they are often not sure which option to use. I like to give ask my dear friend time all these kinds of complicated questions. He has a skill to make a complex subject very simple and easy to understand.

Linchpin People are database coaches and wellness experts for a data driven world. In this 26th episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple words the best High Availability Option for your SQL Server. 


Working with SQL Server a common challenge we are faced with is providing the maximum uptime possible.  To meet these demands we have to design a solution to provide High Availability (HA). Microsoft SQL Server depending on your edition provides you with several options.  This could be database mirroring, log shipping, failover clusters, availability groups or replication.

Each possible solution comes with pro’s and con’s.  Not anyone one solution fits all scenarios so understanding which solution meets which need is important.  As with anything IT related, you need to fully understand your requirements before trying to solution the problem.  When it comes to building an HA solution, you need to understand the risk your organization needs to mitigate the most.

I have found that most are concerned about hardware failure and OS failures. Other common concerns are data corruption or storage issues.  For data corruption or storage issues you can mitigate those concerns by having a second copy of the databases. That can be accomplished with database mirroring, log shipping, replication or availability groups with a secondary replica.  Failover clustering and virtualization with shared storage do not provide redundancy of the data.

I recently created a chart outlining some pros and cons of each of the technologies that I posted on my blog.

I like to use this chart to help illustrate how each technology provides a certain number of benefits.  Each of these solutions carries with it some level of cost and complexity.  As a database professional we should all be familiar with these technologies so we can make the best possible choice for our organization.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Iridium I/O – SQL Server Deduplication that Shrinks Databases and Improves Performance

Database performance is a common problem for SQL Server DBA’s.  It seems like we spend more time on performance than just about anything else.  In many cases, we use scripts or tools that point out performance bottlenecks but we don’t have any way to fix them.  For example, what do you do when you need to speed up a query that is already tuned as well as possible?  Or what do you do when you aren’t allowed to make changes for a database supporting a purchased application?

Iridium I/O for SQL Server was originally built at Confio software (makers of Ignite) because DBA’s kept asking for a way to actually fix performance instead of just pointing out performance problems. The technology is certified by Microsoft and was so promising that it was spun out into a separate company that is now run by the Confio Founder/CEO and technology management team.

Iridium uses deduplication technology to both shrink the databases as well as boost IO performance.  It is intriguing to see it work.  It will deduplicate a live database as it is running transactions.  You can watch the database get smaller while user queries are running.

Iridium is a simple tool to use. After installing the software, you click an “Analyze” button which will spend a minute or two on each database and estimate both your storage and performance savings.  Next, you click an “Activate” button to turn on Iridium I/O for your selected databases.  You don’t need to reboot the operating system or restart the database during any part of the process.

As part of my test, I also wanted to see if there would be an impact on my databases when Iridium was removed.  The ‘revert’ process (bringing the files back to their SQL Server native format) was executed by a simple click of a button, and completed while the databases were available for normal processing.

I was impressed and enjoyed playing with the software and encourage all of you to try it out.  Here is the link to the website to download Iridium for free. .

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Faster SQL Server Databases and Applications – Power and Control with SafePeak Caching Options

Update: This blog post is written based on the SafePeak, which is available for free download.

Today, I’d like to examine more closely one of my preferred technologies for accelerating SQL Server databases, SafePeak. Safepeak’s software provides a variety of advanced data caching options, techniques and tools to accelerate the performance and scalability of SQL Server databases and applications.

I’d like to look more closely at some of these options, as some of these capabilities could help you address lagging database and performance on your systems.

To better understand the available options, it is best to start by understanding the difference between the usual “Basic Caching” vs. SafePeak’s “Dynamic Caching”.

Basic Caching

Basic Caching (or the stale and static cache) is an ability to put the results from a query into cache for a certain period of time. It is based on TTL, or Time-to-live, and is designed to stay in cache no matter what happens to the data. For example, although the actual data can be modified due to DML commands (update/insert/delete), the cache will still hold the same obsolete query data. Meaning that with the Basic Caching is really static / stale cache.  As you can tell, this approach has its limitations.

Dynamic Caching

Dynamic Caching (or the non-stale cache) is an ability to put the results from a query into cache while maintaining the cache transaction awareness looking for possible data modifications. The modifications can come as a result of:

  • DML commands (update/insert/delete),
  • indirect modifications due to triggers on other tables,
  • executions of stored procedures with internal DML commands
  • complex cases of stored procedures with multiple levels of internal stored procedures logic.

When data modification commands arrive, the caching system identifies the related cache items and evicts them from cache immediately. In the dynamic caching option the TTL setting still exists, although its importance is reduced, since the main factor for cache invalidation (or cache eviction) become the actual data updates commands.

Now that we have a basic understanding of the differences between “basic” and “dynamic” caching, let’s dive in deeper.

SafePeak: A comprehensive and versatile caching platform

SafePeak comes with a wide range of caching options. Some of SafePeak’s caching options are automated, while others require manual configuration. Together they provide a complete solution for IT and Data managers to reach excellent performance acceleration and application scalability for  a wide range of business cases and applications.

  • Automated caching of SQL Queries: Fully/semi-automated caching of all “read” SQL queries, containing any types of data, including Blobs, XMLs, Texts as well as all other standard data types. SafePeak automatically analyzes the incoming queries, categorizes them into SQL Patterns, identifying directly and indirectly accessed tables, views, functions and stored procedures;
  • Automated caching of Stored Procedures: Fully or semi-automated caching of all read” stored procedures, including procedures with complex sub-procedure logic as well as procedures with complex dynamic SQL code. All procedures are analyzed in advance by SafePeak’s  Metadata-Learning process, their SQL schemas are parsed – resulting with a full understanding of the underlying code, objects dependencies (tables, views, functions, sub-procedures) enabling automated or semi-automated (manually review and activate by a mouse-click) cache activation, with full understanding of the transaction logic for cache real-time invalidation;
  • Transaction aware cache: Automated cache awareness for SQL transactions (SQL and in-procs);
  • Dynamic SQL Caching: Procedures with dynamic SQL are pre-parsed, enabling easy cache configuration, eliminating SQL Server load for parsing time and delivering high response time value even in most complicated use-cases;
  • Fully Automated Caching: SQL Patterns (including SQL queries and stored procedures) that are categorized by SafePeak as “read and deterministic” are automatically activated for caching;
  • Semi-Automated Caching: SQL Patterns categorized as “Read and Non deterministic” are patterns of SQL queries and stored procedures that contain reference to non-deterministic functions, like getdate(). Such SQL Patterns are reviewed by the SafePeak administrator and in usually most of them are activated manually for caching (point and click activation);
  • Fully Dynamic Caching: Automated detection of all dependent tables in each SQL Pattern, with automated real-time eviction of the relevant cache items in the event of “write” commands (a DML or a stored procedure) to one of relevant tables. A default setting;
  • Semi Dynamic Caching: A manual cache configuration option enabling reducing the sensitivity of specific SQL Patterns to “write” commands to certain tables/views. An optimization technique relevant for cases when the query data is either known to be static (like archive order details), or when the application sensitivity to fresh data is not critical and can be stale for short period of time (gaining better performance and reduced load);
  • Scheduled Cache Eviction: A manual cache configuration option enabling scheduling SQL Pattern cache eviction based on certain time(s) during a day. A very useful optimization technique when (for example) certain SQL Patterns can be cached but are time sensitive. Example: “select customers that today is their birthday”, an SQL with getdate() function, which can and should be cached, but the data stays relevant only until 00:00 (midnight);
  • Parsing Exceptions Management: Stored procedures that were not fully parsed by SafePeak (due to too complex dynamic SQL or unfamiliar syntax), are signed as “Dynamic Objects” with highest transaction safety settings (such as: Full global cache eviction, DDL Check = lock cache and check for schema changes, and more). The SafePeak solution points the user to the Dynamic Objects that are important for cache effectiveness, provides easy configuration interface, allowing you to improve cache hits and reduce cache global evictions. Usually this is the first configuration in a deployment;
  • Overriding Settings of Stored Procedures: Override the settings of stored procedures (or other object types) for cache optimization. For example, in case a stored procedure SP1 has an “insert” into table T1, it will not be allowed to be cached. However, it is possible that T1 is just a “logging or instrumentation” table left by developers. By overriding the settings a user can allow caching of the problematic stored procedure;
  • Advanced Cache Warm-Up: Creating an XML-based list of queries and stored procedure (with lists of parameters) for periodically automated pre-fetching and caching. An advanced tool allowing you to handle more rare but very performance sensitive queries pre-fetch them into cache allowing high performance for users’ data access;
  • Configuration Driven by Deep SQL Analytics: All SQL queries are continuously logged and analyzed, providing users with deep SQL Analytics and Performance Monitoring. Reduce troubleshooting from days to minutes with database objects and SQL Patterns heat-map. The performance driven configuration helps you to focus on the most important settings that bring you the highest performance gains. Use of SafePeak SQL Analytics allows continuous performance monitoring and analysis, easy identification of bottlenecks of both real-time and historical data;
  • Cloud Ready: Available for instant deployment on Amazon Web Services (AWS).

As you can see, there are many options to configure SafePeak’s SQL Server database and application acceleration caching technology to best fit a lot of situations. If you’re not familiar with their technology, they offer free-trial software you can download that comes with a free “help session” to help get you started.

You can access the free trial here. Also, SafePeak is available to use on Amazon Cloud.

Reference: Pinal Dave (http://blog.sqlauthority.com)

MySQL – Scalability on Amazon RDS: Scale out to multiple RDS instances

Today, I’d like to discuss getting better MySQL scalability on Amazon RDS.

The question of the day: “What can you do when a MySQL database needs to scale write-intensive workloads beyond the capabilities of the largest available machine on Amazon RDS?”

Let’s take a look.

In a typical EC2/RDS set-up, users connect to app servers from their mobile devices and tablets, computers, browsers, etc.  Then app servers connect to an RDS instance (web/cloud services) and in some cases they might leverage some read-only replicas.

 

Figure 1. A typical RDS instance is a single-instance database, with read replicas.  This is not very good at handling high write-based throughput.

As your application becomes more popular you can expect an increasing number of users, more transactions, and more accumulated data.  User interactions can become more challenging as the application adds more sophisticated capabilities. The result of all this positive activity: your MySQL database will inevitably begin to experience scalability pressures.

What can you do?

Broadly speaking, there are four options available to improve MySQL scalability on RDS.

1. Larger RDS Instances – If you’re not already using the maximum available RDS instance, you can always scale up – to larger hardware.  Bigger CPUs, more compute power, more memory et cetera. But the largest available RDS instance is still limited.  And they get expensive.

“High-Memory Quadruple Extra Large DB Instance”:

  • 68 GB of memory
  • 26 ECUs (8 virtual cores with 3.25 ECUs each)
  • 64-bit platform
  • High I/O Capacity
  • Provisioned IOPS Optimized: 1000Mbps

2. Provisioned IOPs – You can get provisioned IOPs and higher throughput on the I/O level.

However, there is a hard limit with a maximum instance size and maximum number of provisioned IOPs you can buy from Amazon and you simply cannot scale beyond these hardware specifications.

3. Leverage Read Replicas – If your application permits, you can leverage read replicas to offload some reads from the master databases. But there are a limited number of replicas you can utilize and Amazon generally requires some modifications to your existing application.

And read-replicas don’t help with write-intensive applications.

4. Multiple Database Instances – Amazon offers a fourth option:

You can implement partitioning,thereby spreading your data across multiple database Instances” (Link)

However, Amazon does not offer any guidance or facilities to help you with this. “Multiple database instances” is not an RDS feature.  And Amazon doesn’t explain how to implement this idea.

In fact, when asked, this is the response on an Amazon forum:

Q: Is there any documents that describe the partition DB across multiple RDS?
I need to use DB with more 1TB but exist a limitation during the create process, but I read in the any FAQ that you need to partition database, but I don’t find any documents that describe it.

A: “DB partitioning/sharding is not an official feature of Amazon RDS or MySQL, but a technique to scale out database by using multiple database instances. The appropriate way to split data depends on the characteristics of the application or data set. Therefore, there is no concrete and specific guidance.”

So now what?

The answer is to scale out with ScaleBase.

Amazon RDS with ScaleBase: What you get – MySQL Scalability!

ScaleBase is specifically designed to scale out a single MySQL RDS instance into multiple MySQL instances.

Critically, this is accomplished with no changes to your application code.  Your application continues to “see” one database.   ScaleBase does all the work of managing and enforcing an optimized data distribution policy to create multiple MySQL instances.

With ScaleBase, data distribution, transactions, concurrency control, and two-phase commit are all 100% transparent and 100% ACID-compliant, so applications, services and tooling continue to interact with your distributed RDS as if it were a single MySQL instance.

The result: now you can cost-effectively leverage multiple MySQL RDS instance to scale out write-intensive workloads to an unlimited number of users, transactions, and data.

Amazon RDS with ScaleBase: What you keep – Everything!

And how does this change your Amazon environment?

1. Keep your application, unchanged – There is no change your application development life-cycle at all.  You still use your existing development tools, frameworks and libraries.  Application quality assurance and testing cycles stay the same. And, critically, you stay with an ACID-compliant MySQL environment.

2. Keep your RDS value-added services – The value-added services that you rely on are all still available. Amazon will continue to handle database maintenance and updates for you. You can still leverage High Availability via Multi A-Z.  And, if it benefits youra application throughput, you can still use read replicas.

3. Keep your RDS administration – Finally the RDS monitoring and provisioning tools you rely on still work as they did before.

With your one large MySQL instance, now split into multiple instances, you can actually use less expensive, smallersmaller available RDS hardware and continue to see better database performance.

Conclusion

Amazon RDS is a tremendous service, but it doesn’t offer solutions to scale beyond a single MySQL instance. Larger RDS instances get more expensive.  And when you max-out on the available hardware, you’re stuck.  Amazon recommends scaling out your single instance into multiple instances for transaction-intensive apps, but offers no services or guidance to help you. This is where ScaleBase comes in to save the day.

It gives you a simple and effective way to create multiple MySQL RDS instances, while removing all the complexities typically caused by “DIY” sharding andwith no changes to your applications .

With ScaleBase you continue to leverage the AWS/RDS ecosystem: commodity hardware and value added services like read replicas, multi A-Z, maintenance/updates and administration with monitoring tools and provisioning.

SCALEBASE ON AMAZON

If you’re curious to try ScaleBase on Amazon, it can be found here – Download NOW.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – DMV to Identify Incremental Statistics – Performance improvements in SQL Server 2014 – Part 3

This is the third part of the series Incremental Statistics. Here is the index of the complete series.


In earlier two parts we have seen what is incremental statistics and its simple example. In this blog post we will be discussing about DMV, which will list all the statistics which are enabled for Incremental Updates.

SELECT  OBJECT_NAME(sys.stats.OBJECT_ID) AS TableName,
sys.columns.name AS ColumnName,
sys.stats.name AS StatisticsName
FROM   sys.stats
INNER JOIN sys.stats_columns ON sys.stats.OBJECT_ID = sys.stats_columns.OBJECT_ID
AND sys.stats.stats_id = sys.stats_columns.stats_id
INNER JOIN sys.columns ON sys.stats.OBJECT_ID = sys.columns.OBJECT_ID
AND sys.stats_columns.column_id = sys.columns.column_id
WHERE   sys.stats.is_incremental = 1

If you run above script in the example displayed, in part 1 and part 2 you will get resultset as following.

When you execute the above script, it will list all the statistics in your database which are enabled for Incremental Update. The script is very simple and effective. If you have any further improved script, I request you to post in the comment section and I will post that on blog with due credit.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Simple Example of Incremental Statistics – Performance improvements in SQL Server 2014 – Part 2

This is the second part of the series Incremental Statistics. Here is the index of the complete series.


In part 1 we have understood what is incremental statistics and now in this second part we will see a simple example of incremental statistics. This blog post is heavily inspired from my friend Balmukund’s must read blog post. If you have partitioned table and lots of data, this feature can be specifically very useful.

Prerequisite

Here are two things you must know before you start with the demonstrations.

AdventureWorks – For the demonstration purpose I have installed AdventureWorks 2012 as an AdventureWorks 2014 in this demonstration.

Partitions – You should know how partition works with databases.

Setup Script

Here is the setup script for creating Partition Function, Scheme, and the Table. We will populate the table based on the SalesOrderDetails table from AdventureWorks.

-- Use Database
USE AdventureWorks2014
GO
-- Create Partition Function
CREATE PARTITION FUNCTION IncrStatFn (INT) AS RANGE LEFT FOR VALUES
(44000, 54000, 64000, 74000)
GO
-- Create Partition Scheme
CREATE PARTITION SCHEME IncrStatSch AS PARTITION [IncrStatFn] TO
([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY])
GO
-- Create Table Incremental_Statistics
CREATE TABLE [IncrStatTab](
[SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL,
[ModifiedDate] [datetime] NOT NULL)
ON IncrStatSch(SalesOrderID)
GO
-- Populate Table
INSERT INTO [IncrStatTab]([SalesOrderID], [SalesOrderDetailID],
[CarrierTrackingNumber], [OrderQty], [ProductID],
[SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate])
SELECT     [SalesOrderID], [SalesOrderDetailID],
[CarrierTrackingNumber], [OrderQty], [ProductID],
[SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate]
FROM       [Sales].[SalesOrderDetail]
WHERE      SalesOrderID < 54000
GO

Check Details

Now we will check details in the partition table IncrStatSch.

-- Check the partition
SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('IncrStatTab')
GO

You will notice that only a few of the partition are filled up with data and remaining all the partitions are empty.

Now we will create statistics on the Table on the column SalesOrderID.

However, here we will keep adding one more keyword which is INCREMENTAL = ON. Please note this is the new keyword and feature added in SQL Server 2014. It did not exist in earlier versions.

-- Create Statistics
CREATE STATISTICS IncrStat
ON [IncrStatTab] (SalesOrderID)
WITH FULLSCAN, INCREMENTAL = ON
GO

Now we have successfully created statistics let us check the statistical histogram of the table.

Now let us once again populate the table with more data. This time the data are entered into a different partition than earlier populated partition.

-- Populate Table
INSERT INTO [IncrStatTab]([SalesOrderID], [SalesOrderDetailID],
[CarrierTrackingNumber], [OrderQty], [ProductID],
[SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate])
SELECT     [SalesOrderID], [SalesOrderDetailID],
[CarrierTrackingNumber], [OrderQty], [ProductID],
[SpecialOfferID], [UnitPrice],   [UnitPriceDiscount], [ModifiedDate]
FROM       [Sales].[SalesOrderDetail]
WHERE      SalesOrderID > 54000
GO

Let us check the status of the partition once again with following script.

-- Check the partition
SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('IncrStatTab')
GO

Statistics Update

Now here has the new feature come into action. Previously, if we have to update the statistics, we will have to FULLSCAN the entire table irrespective of which partition got the data.

However, in SQL Server 2014 we can just specify which partition we want to update in terms of Statistics. Here is the script for the same.

-- Update Statistics Manually
UPDATE STATISTICS IncrStatTab (IncrStat)
WITH RESAMPLE ON PARTITIONS(3, 4)
GO

Now let us check the statistics once again.

-- Show Statistics
DBCC SHOW_STATISTICS('IncrStatTab', IncrStat)
WITH HISTOGRAM
GO

Upon examining statistics histogram, you will notice that now the distribution has changed and there is way more rows in the histogram.

Summary

The new feature of Incremental Statistics is indeed a boon for the scenario where there are partitions and statistics needs to be updated frequently on the partitions. In earlier version to update statistics one has to do FULLSCAN on the entire table which was wasting too many resources. With the new feature in SQL Server 2014, now only those partitions which are significantly changed can be specified in the script to update statistics.

Cleanup

You can clean up the database by executing following scripts.

-- Clean up
DROP TABLE [IncrStatTab]
DROP PARTITION SCHEME [IncrStatSch]
DROP PARTITION FUNCTION [IncrStatFn]
GO

Reference: Pinal Dave (http://blog.sqlauthority.com)