SQL SERVER – What is Filtered Statistics?

Continuous learning is the only way to keep myself up-to-date with SQL Server product features. While I was working for a demo to show usage of SQL Server Statistics, I came across feature called Filter Statistics which was introduced in SQL Server 2008.

There are many times when we have all indexes and statistics in-place, but still optimizer is not able to make a proper estimation because it doesn’t understand the co-relation of the data. To show the power of this, I was able to make a small demo and show its real time usage.

In my database, I have created two tables. Employee and Sales. Here is the script.

IF DB_ID('DropMeAfterDemo') IS NOT NULL
USE DropMeAfterDemo
CREATE TABLE Employee(EmpID INT, name NVARCHAR(100))
CREATE INDEX ix_Employee_name ON Employee(name)
CREATE STATISTICS ix_Employee_EmpID_name ON Employee(EmpID, name)
CREATE CLUSTERED INDEX ix_Sales_id_amount ON Sales(EmpID, SalesAmount)

This company has only two employees and they are not doing the same amount of sales. This is what is called as skewness in data. Here is the insert script to populate them in uneven fashion.

-- SQLAuthority is a small blog where me and my wife work
INSERT Employee VALUES(1, 'Pinal')
INSERT Employee VALUES(2, 'Nupur')
-- Pinal did very bad with just one Sale
--  but Nupur did very well with 5000 Sales
@loop = 1
WHILE @loop <= 5000 BEGIN
Sales  VALUES (2, @loop, GETDATE()-RAND()*1000)
SET @loop = @loop + 1

To make sure the statistics are proper, I would update it with full scan.

-- updating all statistics for both tables

Now, let’s run the query and have a look at the actual execution plan. You need to enable that in SQL Server Management Studio.

-- Let's look at estimated and actual number of rows
DBCC freeproccache
SELECT salesamount
FROM employee
WHERE employee.empid = sales.empid
AND name = 'pinal'
DBCC freeproccache
SELECT salesamount
FROM employee
WHERE employee.empid = sales.empid
AND name = 'nupur'

Here is the query plan. As we can see that the estimated are not correct because SQL can’t determine the matching rows until it find Employee ID from Employee Table.

Now, let’s create filtered statistics based on where clause.

-- create filter statistics based on where clause in query.
CREATE STATISTICS Employee_stats_EmpID ON Employee (EmpID)
WHERE name = 'Pinal'
CREATE STATISTICS  Employee_stats_EmpID2 ON Employee (EmpID)
WHERE name = 'Nupur'

If we run exact same query as earlier, here are the plans.

Here is the plan for “name = pinal”

Here is the plan for “name = nupur”

Notice two things here:
1. Statistics picked by optimizer are accurate now. Estimated rows is equal to actual rows.
2. Query Plan is different for two different values.

Here is how we can see filtered statistics

FROM   sys.stats
WHERE  filter_definition IS NOT NULL

The main usage of this feature would be in a situation where data is skewed. Have you ever got a chance to use this feature in your environment? Let me know.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Transaction Log Backup Failing After Rebuilding Database Log File

Speaking at community events get me really energized. It is wonderful to connect with people who build some of the greatest code ever for their respective organizations. Last weekend, I was at a conference in Delhi and it was an awesome opportunity for me to meet a number of you. I would take a moment to Thank all who took a minute to walk to me and talk to me about your experience. These experiences are what make me who I am.

Coming out of the conference I had this one moment that I think just made me work on something quick and I thought this is worth a share as a blog. This was a corridor conversation during the first day end that this DBA walked up to me and said, if I was up for a challenge as they were facing some problem recently in their deployment. After doing a couple of sessions, I was a bit tired but this challenge got me thinking. I stopped and said, I am game to hear you. I will try to answer and if I can’t, meet me first thing tomorrow morning and we should have something.

He started by saying – “Pinal, our transaction log backups are failing.”

Pinal: That is a start, you need to let me know what the error is.

DBA: Well, it is part of our test box. We have a maintenance plan and it started to fail. So there is no error that I can see yet.

Pinal: Was it working in the past?

DBA: Oh yeah!! It used to work and now the maintenance plans are not working I think.

Pinal: Did you guys do anything different on this box before this error started surfacing?

DBA: Hmmm … Last week we had a bad hard disk corruption. We detached the database and then we replaced the HDD. Then we attached the DB back.

Pinal: Is there anything that you folks did differently while attaching?

DBA: Oh yeah, when the HDD crashed – we lost the Log file and so when we attached we just rebuilt the Log.

Pinal: Well, you have the answer with you. That is the problem. Did you know that?

DBA: Is it? It was configured as Full recovery model. Everything was fine Pinal.

Pinal: You are correct, I want to show you what happened behind the scene. Meet me tomorrow.

That evening I crashed early and got up early to bring a repro to this problem. I love such challenges that make me learn.

So here is the steps to reproduce this situation.

Step 1: Create Database and make it FULL recovery model

This is the simple part.

( NAME = N'RebuildLogDB', FILENAME = N'C:\Rebuild\RebuildLogDB.mdf',
SIZE = 10240KB, FILEGROWTH = 1024KB)
( NAME = N'RebuildLogDB_log', FILENAME = N'C:\Rebuild\RebuildLogDB_log.ldf',

-- Take a FULL backup to make it truly Full Recovery Model
WITH NOFORMAT, NOINIT,  NAME = N'RebuildLogDB-Full Database Backup',

Step 2: Detach the Database

We will detach the Database and then we will go ahead and delete the Log file manually from the folder.

-- Detach the DB
EXEC MASTER.dbo.sp_detach_db @dbname = N'RebuildLogDB'

Step 3: Attach the DB without Log file

-- Attach without the Log file.
USE [master]
( FILENAME = N'C:\Rebuild\RebuildLogDB.mdf')

On executing the above, we will be given a warning message like below:

File activation failure. The physical file name “C:\Rebuild\RebuildLogDB_log.ldf” may be incorrect.

New log file ‘C:\Rebuild\RebuildLogDB_log.ldf’ was created.

The database would have been created. To test the scenario my DBA friend was saying, I went to SSMS and tried to take a Transaction Log Backup. But was surprised we didn’t have the option.

The catch here is hidden in the “Recovery Model” that is shown above. If you notice, it is showing as “SIMPLE” – which means we willnot be able to take a TLog backup without changing the same.

The resolution for this is simple, change the recovery model to FULL and take a Full backup for TLog backups to succeed now.

Once we do this, everything will come back to normal status. This was an example to show you what happens when we rebuild the Log without knowing it changes the Recovery Model to Simple in the process.

PS: Please DONOT do this in your production environment.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #014 – How to DELETE Duplicate Rows

Here is another very common question I often received, which candidates face in the interview.

Question: How to delete duplicate rows in SQL Server (considering there is an identity column in table)?

Answer: Well, there is no better answer than following a script. If you ask me this question in the interview, I will even have to look up online and find following script:

FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Here is the quick video I have recorded where I demonstrate the same functionality in simple words.


Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – An Observation SET vs SELECT in T-SQL

In SQL Server both SET and SELECT can be used to assign a value to a variable as shown in the following example

SELECT @no AS no

SELECT @no AS no

The result is 10

You can also assign valaues taken from a table to a variable.

Let us create the following dataset.

CREATE TABLE #products(prod_id INT, prod_name VARCHAR(100))
INSERT INTO #products(prod_id,prod_name)
SELECT 10001,'Samsung' UNION ALL
SELECT 10003,'Micromax' UNION ALL
SELECT 10005,'LG'

Suppose you want to know the product id for the product “Micromax” and assign it to a variable. You can do it in two ways

DECLARE @prod_id INT
@prod_id =(SELECT prod_id FROM #products WHERE prod_name='Micromax')
SELECT @prod_id AS prod_id

DECLARE @prod_id VARCHAR(100)
SELECT @prod_id = prod_id FROM #products WHERE prod_name='Micromax'
SELECT @prod_id AS prod_id

The result is 10003

Do you think it will always work in the same way? No. Here is an example.

You will get an error when you execute the following code using SET because the query returns mutile values that cannot be assigned to a single variable

DECLARE @prod_id INT
@prod_id =(SELECT prod_id FROM #products)
SELECT @prod_id AS prod_id

But when you use SELECT you will not get an error

DECLARE @prod_id VARCHAR(100)
SELECT @prod_id = prod_id FROM #products
SELECT @prod_id AS prod_id

The result is 10005. The lastly available value is assigned to the variable. So you need to keep this in mind when using SET and SELECT.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Log Shipping – Sharing Tab Missing in Folder Properties

Building a solution for some sort of disaster site is critical for every organization that thinks about business continuity. One of the proven for ages is a solution around Log Shipping. Though one of the oldest, it is one of the most sought after solution inside SQL Server. With solutions like SQL Server AlwaysOn Availability Groups and others taking prominence, there is still a group of people who still love using Log Shipping techniques. I personally don’t have any negative opinion about it, but I am curious to understand why you resorted to Log Shipping when you have options like Database Mirroring, AlwaysOn techniques. I am sure it will be a learning for me too in the specific scenarios you bring to the table.

It was looking just like just another day of routine emails when one of my blog reader contacted me and informed that he is not able to configure log-shipping. While getting more details I asked him to provide exact error message and steps he was using. There is no way I can debug this without proper steps. Here is his problem as described by him via email.

I think my problem is not related to SQL Server at this point. When I right clicked on LogShippingPrimary folder and opened Properties dialog, there were only 4 tabs, General, Security, Previous Versions, Customize, NO Sharing tab. Due to this, I am not able to create Share which I can use for log-shipping purpose.

Also “Share With” option is missing on the right-click context menu for folders.

Have you seen this before?

Thanks in advance.
My reply to him:

Hello Blog Visitor,
Here are the thing which I wish to verify. Thanks again for sending such queries to me. I am glad I could be of help and hope the following steps would lead you in the right direction.

1. Open Network Connections control panel, open the properties page for your Connection. Are “Client For Microsoft Networks” and “File and Printer Sharing for Microsoft Networks” checked?

2. In the Services console (start, run, services.msc), make sure that “Server” service is running?


With the above two recommendation, I got a reply from him and he informed that “Server” service was in disabled state due to hardening of the server done by Wintel team. Here is the screenshot which he shared after the issue was resolved.

Have you ever seen such scenario? Any other things which have caused the same issue in your environment? I am sure we can learn from your experiences too. Let me know so that I can add in the blog.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – 3 Common Mistakes of Agile Development – Notes from the Field #074

[Note from Pinal]: This is a 74th episode of Notes from the Field series.  Agile is a time boxed, iterative approach to software delivery that builds software incrementally from the start of the project, instead of trying to deliver it all at once near the end. Storytelling is an art very few have mastered in their life. When I requested Stuart to share his ideas about agile, he shared a very interesting insight on this subject. He had very interesting story to share about 3 common mistakes of agile developments. I very much enjoyed his notes from the field and I am confident that you will like it too.

In this episode of the Notes from the Field series database expert Stuart Ainsworth explains about 3 Common Mistakes of Agile Development.

I’m a developer by history, but a project manager at heart.  I’ve started becoming much more interested in helping teams improve their workflow, not just helping them write better code.  To that end, most of the development shops that I’ve worked have struggled with the ongoing battle to get features built and shipped on a schedule that satisfies business requirements.  Good developers are proud of their craft, and want maximum time to create; business needs features to go out the door quickly in order to compete.  These goals are often in conflict with each other.

Agile methodologies (such as scrum) try to help balance this tension by encouraging active conversation between business and development, and continuously delivering working software that is flexible and adaptable to change.  In the shops where I’ve seen agile development fail to deliver, I’ve noticed the following 3 bad habits:

  1. We have a failure to communicate.

Communication sounds easy, but it’s really, really hard.  Well-defined requirements help, but they’re no substitute for ongoing mutual conversations between developers and business people.  If a requirements document is the only method of explaining what to build and when to build it, you lose the ability to forecast what comes next in terms of building blocks.

Several of the principles of that Agile Manifesto deal with communication between business and development, but my favorite one is “business people and developers must work together daily throughout the project.”  If you want to be an agile development shop, you have to be an agile business.   Business needs to understand the challenges faced by developers, and developers need to be involved in understanding the changing needs of business.

  1. Code releases are always a feature release or a bug fix.

Bug fixes are good, and features are what make money; however, if the only time your shop is releasing code is to implement a feature or fix a bug, then you’re not continually improving your product.  The temptation is to fall back into a waterfall-like methodology; deployments become huge stacks of code that are not added to the product (or operational environment) until the day that a feature is supposed to be released.  The larger the stack of code, the harder it is to test, and the greater the risk of failure.

What agile principles suggest is that you should “deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.”  The goal is to release code frequently, even if that code is only establishing groundwork for future development efforts; smaller code is easier to test, and ultimately, a feature release becomes the final step in a series of releases.

  1. Software release dates depend on the quality and completeness of the code.

This is similar to the second issue, but a slight variation; a shop that constantly changes the length of their iteration will ultimately experience burnout.  It becomes harder and harder to stay on schedule and feature releases get pushed further and further back.

I prefer to have a fixed iteration period, either every four weeks or once a month.  There’s something about a cadence that motivates people to focus and get things done.  If a developer is working on a bit of code that’s supposed to ship in a month, it’s easy to evaluate how likely that’s going to happen within a couple of weeks; if it’s not going to be complete, build the working software, and release it.  With each iteration, it becomes easier to define what can be done in a fixed-length sprint.


Agile software development lends itself to creative implementations, but it’s important to stay true to the principles of good communication, continuous improvement, and maintaining a constant pace for development.  Avoiding some basic pitfalls can help your team stay productive in the ongoing race to get features out the door.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Quiz on knowing DATEPART and DATENAME Behaviors

Playing around with date and time datatypes can be loads of fun. I hope you got the pun intended when I say it is fun. Most of the developers working with date and time datatypes feel it is a pain. I understand their sentiments, but would like to let you know that, it is not a pain as you think. If we understand how date, time and their respective functions work inside SQL Server, then most of the tasks that we complete will be a breeze. Trust me on this. I get stumped by people from time to time with these small tricks and I get into the learning curve of why that happened. It is easier than what you think.

If you did participate in my previous quiz earlier last month (SQL SERVER – Trivia – Days in a Year), then this will be really easier than you think. You will know the answer in a jiffy.

Remember the winners will get a free monthly subscription of Pluralsight. Winners will be randomly picked on March 10th midnight.

Part 1: Understanding DATENAME

The first part of this quiz is understanding what your functions are. Look at the statements below and guess what the answer would be. Then go ahead and try the same inside SQL Server Management Studio.

-- The reason is there is no “Day” specified in the source datetime.
SELECT DATENAME(DAYOFYEAR, '12:12:12.123') [DayOfYear]
SELECT DATENAME(WEEKDAY, '12:12:12.123') [WeekDay]

Should the above code work? What is the output, an error or values? If you did get an error, can you spot the reason? If you got values then can you tell why you got those values?

Part 2: Understanding DATEPART

If you got the above correct, then this should be a breeze for sure. You will get the answer easily. Let me know how many got it right just by watching the statements.

-- What about now?
SELECT DATEPART(weekday, '12:12:12.123') [WeekDay]

Should the above code work? What is the output, an error or values? If you did get an error, can you spot the reason? If you got values then can you tell why you got those values? Is it different from Part 1 of your guess with DATENAME or same? Can you tell why?

If you got this far, then great. I think you got the basics covered.

Part 3: Bonus Question

I thought it would be interesting to add a twist to the tale and get your guess on what the output should be for the below queries.

-- First
SELECT DATENAME(dayofyear, '1') [DayOfYear]
-- Second
SELECT DATENAME(dayofyear, '1-1') [DayOfYear]
-- Third
SELECT DATENAME(dayofyear, '2:2') [DayOfYear]
-- Fourth
SELECT DATENAME(dayofyear, '23:25') [DayOfYear]
-- Fifth
SELECT DATENAME(dayofyear, '24:25') [DayOfYear]

Now which of the above queries will run without any error?

Let me know via comments to all your guesses and the reasons for those guesses. Based on the response, I will look at giving away some goodie to one lucky winner.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Script to find SQL Startup Account using WMIC

Friends are your lifeline that keep your life interesting. I have various friends’ circles that range from family, acquaintances, work and the most exciting circle is the technical circles. Most of the times the tech circles is what fuels most of the post that lands up here. I was talking to one of my friend and he gave a passing statement without much info. It was in my mind for a long time to research, but am I glad I figured out. I learned new way to find details about SQL Server startup account – using WMIC. I did some more research with this and sharing it with my blog readers.

What is WMIC? It stands for Windows Management Instrumentation Command. In simple words, WMIC is a utility which allows us to interact with WMI from a WMI-aware command-line shell.  All WMI objects and their properties, including their methods, are accessible through the shell, which makes WMIC a super strong console.

WMIC has a parameter for Service application management. To get help we can use /? as shown below:

We can use where clause and get method to get details about services related to SQL Server. Here is the naming which is consistent since old days:

SQL Component Display Name Service Name
SQL Server – Default Instance SQL Server (MSSQLSERVER) MSSQLServer
SQL Server – Named Instance SQL Server (NameOfInstance) MSSQL$NameOfInstance
SQL Server Agent – Default Instance SQL Server Agent (MSSQLSERVER) SQLSERVERAGENT
SQL Server Agent – Named Instance SQL Server Agent (NameOfInstance) SQLAgentNameOfInstance
SQL Server Browser SQL Server Browser SQLBrowser
SQL Server Full text – Default Instance SQL Full-text Filter Daemon Launcher (MSSQLSERVER) MSSQLFDLauncher
SQL Server Full text – Named Instance SQL Full-text Filter Daemon Launcher (NameOfInstance) MSSQLFDLauncher$SQL2014

You can add more to list for Reporting Services, Analysis Services etc.

Here is the WMI query which can list services using like operator. If you can familiar with T-SQL then you would know that % is operator.

wmic service where “name Like ‘%MSSQL%’ or name like ‘%SQL%Agent%'” get Name , StartName

and here is the output

Did you find it interesting? Do let me know what you find in your environments? The learning will never stop here though.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Using MaxTransferSize parameter with SQL Server Backups

Off late I have been writing a lot around SQL Server backups and administration topics. I have seen a lot of my readers have been enjoying some of the deep conversations I have with people in the industry. As I always say, most of the blog posts are inspired by someone asking me a question, me doing the research and then reaching out to them to give a solution. This blog post is no different in that aspect.

Recently I was fortunate to be in Delhi for a session to one of our customers. As many have seen me around, one of the DBA came and asked me an interesting question. The organization had bought a new EMC storage and one of the recommendations was to use a transfer size of 256 KB. The first instinct for me – “Is it? I didn’t know that.” I said I would revert back to them after I reach the hotel and do my bit of research.

On reading a number of documentation and recommendations on the various sites. I got to know, the optimal value for the MaxTransferSize parameter for SQL Server backups for EMC storage systems is 256 KB, while by default SQL server uses a value of 1,024 KB (1 MB). To promote storage array stability and performance, full database, differential and transaction log backups must be performed with MaxTransferSize specified at 262,144 bytes (256 KB).

Well, that was easier said than done. I now wanted to know how this can be done because the DBA wanted to automate this using their scripting inside a SQL Server Agent Job. I turned to MSDN for help and as always it didn’t disappoint me.

Below are some sample T-SQL Backup commands that are optimized for this requirement:

/* Full database backup */
BACKUP DATABASE AdventureWorks TO DISK = 'C:\MS SQL\Bakup Files\AdventureWorks.bak'
/* Differential database backup */
BACKUP DATABASE AdventureWorks TO DISK = 'C:\MS SQL\Bakup Files\AdventureWorks.diff'
/* Transaction Log backup */
BACKUP LOG AdventureWorks TO DISK = 'C:\MS SQL\Bakup Files\AdventureWorks.trn'

To my blog readers, I would like to know if you have ever used these settings in your environments. What type of storage are you using and what values are you using for MAXTRANSFERSIZE parameter? Have you seen any performance improvements while using the same? Have you seen any difference in behavior while using these parameters? Do let me know via the comments section.

Finally, as I wrap up – I always talk about the power of using T-SQL over UI and when I write such blog posts, this just gets better. I need to learn the fineprints of using the T-SQL commands and all the parameters in future. Lest assured, you are going to see more of these in this blog for sure.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Interview Question of the Week #013 – Stored Procedure and Its Advantages – How to Create Stored Procedure

There are some questions which are so old that they should not be asked in the interview and gets old. Here is one of the questions I have spotted so many times in the interview that if co-interviewer asks to the candidate, I often feel bored (well, I was caught yawning last time). Here is the question:

Question: “How do you create stored procedure? and What are the advantages of the stored procedure?”

Answer: Well, as I said, I find this question are age old, so if you search on the internet you will find many different answers. Just make sure you read the answers which are from the latest version of SQL Server as stored procedure have changed a bit since the inception of the same.

Here are few of the advantages of the stored procedure:

  • Execution plan retention and reuse
  • Query auto-parameterization
  • Encapsulation of business rules and policies
  • Application modularization
  • Sharing of application logic between applications
  • Access to database objects that is both secure and uniform
  • Consistent, safe data modification
  • Network bandwidth conservation
  • Support for automatic execution at system start-up
  • Enhanced hardware and software capabilities
  • Improved security
  • Reduced development cost and increased reliability
  • Centralized security, administration, and maintenance for common routines

You can read the original article over here.

If you want to create stored procedure you can create with following simple script:


You can execute above stored procedure by following a script:


The above stored procedure will return results as a current date time.


Reference: Pinal Dave (http://blog.sqlauthority.com)