SQL SERVER – T-SQL Window Function Framing and Performance – Notes from the Field #103

Kathi SQL SERVER   T SQL Window Function Framing and Performance   Notes from the Field #103[Note from Pinal]: In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about T-SQL Over clause. Kathi is an amazing instructor, she was the SQL author I have read in my early career. The reason, I love SQL Server because her writing has instigated love for this technology in me. Today she brings a relatively interesting topic for database experts. Read the experience of  Kathi in her own words.


In my last post to SQL Authority, I explained how to add frames to T-SQL window functions where it is supported. I broke down the syntax and showed where things can go wrong if you rely on the default frame instead of explicitly specifying the frame. Be sure to read that article if you are new to frames or just need a refresher.

When creating a running total, for example, It’s very easy to just leave out the frame, and, in many cases, the results will be what you want. There is another downside to using the default frame, however, and that is a pretty big performance penalty. The default frame, when one is not specified, is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. If the ORDER BY expression in the OVER clause produces unique values, you will get a valid running total.

In order to perform the window function calculation, such as a running total, SQL Server creates a worktable and populates it with each partition. When RANGE is used, that worktable is always created in tempdb. When ROWS is used instead by specifying the frame, the worktable is created in memory most of the time. The worktable created in memory has no I/O, locking, or contention issues, so it performs much better. Unfortunately, when you look at the graphical execution plan, you won’t see much of a difference between ROWS and RANGE. You can see the difference, however, if you look at STATISTICS IO.

To see the difference for yourself, run this script against the AdventureWorks database. Be sure to enable the Actual Execution Plan before running it as well.

SET STATISTICS IO ON;
GO
PRINT 'Default frame';
SELECT CustomerID, SalesOrderID, TotalDue,
SUM(TotalDue) OVER(PARTITION BY CustomerID ORDER BY SalesOrderID) AS RunningTotal
FROM Sales.SalesOrderHeader;
PRINT 'ROWS frame';
SELECT CustomerID, SalesOrderID, TotalDue,
SUM(TotalDue) OVER(PARTITION BY CustomerID ORDER BY SalesOrderID
ROWS UNBOUNDED PRECEDING
) AS RunningTotal
FROM Sales.SalesOrderHeader;

First, take a look at the execution plans as shown in the Figure 1. The two queries did not produce identical plans, but the relative cost for each is 50%. So, at least the execution plans report that the queries perform the same.

103 1 SQL SERVER   T SQL Window Function Framing and Performance   Notes from the Field #103

Figure 1: The graphical execution plans

Now take a look at the messages tab. Here you will see a big difference as shown in Figure 2. The query using the default frame reported a very high number of logical reads from a worktable. The second query, which uses ROWS, reports 0 reads for the worktable. If you turn on Statistics Time, you will also see that the second query runs faster.

103 2 SQL SERVER   T SQL Window Function Framing and Performance   Notes from the Field #103

Figure 2: The statistics IO

It is often difficult to show performance differences with AdventureWorks tables. They are just too small. Adam Machanic, a SQL Server MVP, published a script that creates a couple of large tables based on AdventureWorks data. One of those tables, bigTransactionHistory, contains 30 million rows. I used bigTransactionHistory to do some performance testing to see if there really was a time difference that could be measured between ROWS and RANGE. For the test, I created a couple of smaller tables, subsets of Adam’s table, which contained 15 and 7.5 million rows respectively. I turned off the grid results so that only the server time would be measured. In addition to comparing ROWS and RANGE, I also tested two traditional methods: using a self-join subquery and a cursor. You can see the results of my tests in Figure 3.

103 3 SQL SERVER   T SQL Window Function Framing and Performance   Notes from the Field #103

Figure 3: The results of testing running totals

Using an accumulating window aggregate function (WF) outperformed the traditional techniques even with the default frame. You can see, however, that specifying a Rows frame was incredibly fast.

Whenever frames are supported (FIRST_VALUE, LAST_VALUE and accumulating window aggregates), make sure that you always specify the frame. Not only can you avoid some logic problems caused by RANGE frames, you will almost always see a performance boost by using ROWS. Remember: Use ROWS. Use ROWS. And use ROWS.

notes 82 3 SQL SERVER   T SQL Window Function Framing and Performance   Notes from the Field #103If you would like to learn more about T-SQL window functions, be sure to check out my latest book Expert T-SQL Window Functions in SQL Server or my Pluralsight course T-SQL Window Functions.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – What is T-SQL Window Function Framing? – Notes from the Field #102

Kathi SQL SERVER   What is T SQL Window Function Framing?   Notes from the Field #102[Note from Pinal]: In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about T-SQL Over clause. Kathi is an amazing instructor, she was the SQL author I have read in my early career. The reason, I love SQL Server because her writing has instigated love for this technology in me. Today she brings a relatively interesting topic for database experts. Read the experience of  Kathi in her own words.


One of the best kept secrets of T-SQL window functions, also called windowing or windowed functions, is framing. Framing was introduced with SQL Server 2012, and it allows the set of rows, or window, to be defined very granularly. For example, when calculating a running total, each row needs to see an increasing number of values to perform the calculation. Row 1 needs to see just row 1. Row 2 needs to see rows 1 and 2. Row 3 needs to see rows 1, 2, and 3. Framing allows each row to have a distinct set of rows for the window function calculation.

NOTE: In my last SQL Authority post, I explained the OVER clause. Be sure to review that article if the OVER clause is new to you.

Framing is used in a very specific set of circumstances: with accumulating window aggregates and with FIRST_VALUE and LAST_VALUE. Accumulating window aggregates are the aggregate functions you use every day along with an OVER clause containing an ORDER BY. You can calculate a running total, for example. FIRST_VALUE lets you pull in any column from the first row of the frame. LAST_VALUE lets you pull in any column from the final row of the frame. By default, if you don’t specify the frame in the OVER clause when it is supported, it will consist of the rows starting with row 1 of the partition and continue to the current row. Often, the default frame will give you the results you are looking for. By explicitly specifying the frame, you can achieve even more functionality, avoid some logical errors, and get better performance.

Framing has several keywords you should learn.

Table 1 lists each one.

Term Definition
ROWS A positional operator used to define the frame.
RANGE A logical operator used to define the frame. This operator is not fully implemented as of SQL Server 2014. It is the default frame type.
CURRENT ROW The row where the result of the calculation will be returned and also used as a boundary.
UNBOUNDED PRECEDING The first row of the partition, often used as the lower boundary.
UNBOUNDED FOLLOWING The final row of the partition, often used as the upper boundary.
# PRECEDING A number of rows before the current row, often used as the lower boundary. This is only supported with ROWS.
# FOLLOWING A number of rows following the current row, often used as the upper boundary. This is only supported with ROWS.
BETWEEN Used to connect the two boundaries.

Table 1: Framing terms

Here is the default frame, when one isn’t specified:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

This means that the rows in the window will consist of the first row of the partition and continue to the current row. This works for calculating a running total as long as the ORDER BY columns found in the OVER CLAUSE are unique. If not, because RANGE evaluates the data logically, you can get results that don’t make sense.

When I speak on this topic, I tell my audience they need to learn three things: Use ROWS. Use ROWS. And use ROWS. Instead of relying on the default frame, to calculate a running total, use this frame:

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

If that is too much to type, you can also abbreviate it like this:

ROWS UNBOUNDED PRECEDING

If you would like to calculate a three month moving average, for example, use this frame:

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

I’ll save the performance differences for another post, but I would like to show you some logical problems found by leaving out the frame. This example runs in the AdventureWorks database. The ORDER BY column is not unique, and I have chosen a customer with multiple orders on the same date.

SELECT CustomerID, SalesOrderID, CAST(OrderDate AS Date) AS OrderDate, TotalDue,
SUM(TotalDue) OVER(ORDER BY OrderDate
ROWS UNBOUNDED PRECEDING
) AS RunningTotal,
SUM(TotalDue) OVER(ORDER BY OrderDate) AS DefFrameRunningTotal,
FIRST_VALUE(SalesOrderID) OVER(ORDER BY OrderDate) AS FirstOrder,
LAST_VALUE(SalesOrderID) OVER(ORDER BY OrderDate
ROWS
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS LastOrder,
LAST_VALUE(SalesOrderID) OVER(ORDER BY OrderDate) AS DefFrameLastOrder
FROM Sales.SalesOrderHeader
WHERE CustomerID = 29586
ORDER BY OrderDate;

Here are the results:

102 1notes SQL SERVER   What is T SQL Window Function Framing?   Notes from the Field #102

Take a look at the RunningTotal column. The value increases until you get to the final row. The correct frame is specified the OVER clause. Take a look at the values in DefFrameRunningTotal. Notice that, instead of continually increasing, it repeats values. This problem is due to the default frame, using RANGE, which looks at the value of OrderDate instead of just the position of the row. If the OrderDate values are the same, then they belong together in the same window when RANGE is used. You can solve this problem by using ROWS or by making sure that the ORDER BY expression in the OVER clause is unique. (Use ROWS. Use ROWS. And use ROWS.)

Take a look at the FirstOrder column. In each case, the value returned came from the first row of the partition. Even without specifying the frame, it works correctly. The LastOrder column looks correct as well. Notice that I specified a frame with ROWS that starts with the current row and goes up to the end of the partition. When leaving out the frame as shown in DefFrameLastOrder, the results don’t make much sense at all. Remember, the default frame only goes up to the current row. But since RANGE is a logical operator, it actually will go to a row with the same ORDER BY value as the current row. By default, the frame doesn’t go all the way to the end of the partition where the last value can be found. To correctly use LAST_VALUE, you must specify the frame, and you should use ROWS.

Table 2 lists commonly used frames:

Frame Meaning
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW Start at row 1 of the partition and include rows up to the current row.
ROWS UNBOUNDED PRECEDING Start at row 1 of the partition and include rows up to the current row.
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING Start at the current row and include rows up to the end of the partition.
ROWS BETWEEN N PRECEDING AND CURRENT ROW. Start at a specified number of rows before the current row and include rows up to the current row.
ROWS BETWEEN CURRENT ROW AND N FOLLOWING Start at the current row and include rows up to a specified number of rows following the current row.
ROWS BETWEEN N PRECEDING AND N FOLLOWING Start at a specified number of rows before the current row and include a specified number of rows following the current row. Yes, the current row is also included!

Table 2: Commonly used framing syntax

Specifying the correct frame is really important to ensure that you achieve the expected results. There are also some important performance differences which I’ll talk about in my next SQL Authority post.

notes 82 3 SQL SERVER   What is T SQL Window Function Framing?   Notes from the Field #102If you would like to learn more about T-SQL window functions, be sure to check out my latest book Expert T-SQL Window Functions in SQL Server or my Pluralsight course T-SQL Window Functions.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – What is the OVER Clause? – Notes from the Field #101

Kathi SQL SERVER   What is the OVER Clause?   Notes from the Field #101[Note from Pinal]: In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about T-SQL Over clause. Kathi is an amazing instructor, she was the SQL author I have read in my early career. The reason, I love SQL Server because her writing has instigated love for this technology in me. Today she brings a relatively interesting topic for database experts. Read the experience of  Kathi in her own words.


The OVER clause was added to SQL Server in 2005. Yes, it’s been around for a decade, but most SQL Server professionals know very little about all the ways it can be used. Whenever you add a T-SQL window (also called windowed or windowing) function such as ROW_NUMBER to your query, you must use the OVER clause. Getting the OVER clause right is critical to getting the correct results. Window functions require the OVER clause, but there is one other situation where the OVER clause is used: with the sequence object. In this article, I’m going to stick just with the ways the OVER clause is used with window functions.

The OVER clause has three components: partitioning, ordering, and framing. Partitioning is always supported, but support for ordering and framing depends on which type of window function you are using.

The partitioning component, expressed as PARTITION BY, is optional and is supported for all types of window functions. Partitioning divides up the rows. It’s kind of like a physical window that is divided up into panes. The entire window is a window. The panes are also windows. Say that I partition by CustomerID. The partition for CustomerID 1000 is restricted to rows for CustomerID 1000.  For the calculation on any given row, the window consists of rows from that partition only. This means that the ROW_NUMBER function restarts the row numbers for each partition, for example. You can specify more than one column for PARTITION BY as long as you separate the columns with commas. Partitioning is not the same as grouping with the GROUP BY clause. When you use the GROUP BY clause, only one row is returned for each group. Partitioning is just a way to divide the data in order to perform the calculations; the detail columns are returned. Figure 1 shows an example.

101 1 SQL SERVER   What is the OVER Clause?   Notes from the Field #101

Figure 1: The effects of PARTITION BY

Ordering within the OVER clause is supported when the data must be sorted to perform the calculation. For example, ROW_NUMBER requires an ORDER BY expression within the OVER clause because the rows must be lined up. Actually, most of the window functions support ORDER BY.  ORDER BY in the OVER clause is not supported for calculating subtotals, for example. You don’t need the data sorted to calculate a sum. Note that the ORDER BY within the OVER clause has nothing to do with an ORDER BY clause found in the query itself. Just like partitioning, you can specify a comma delimited list of columns. Figure 2 shows the results of ORDER BY in the OVER clause.

101 2 SQL SERVER   What is the OVER Clause?   Notes from the Field #101

Figure 2: Using ORDER BY

The third component of the OVER CLAUSE is called framing. Framing was introduced with SQL Server 2012 and is used in only a couple of situations: accumulating window aggregates (running totals, moving averages, etc.) and the functions FIRST_VALUE and LAST_VALUE. While partitioning is similar to window panes, framing is like a stained glass window. When performing the calculations, every row might see a different set of rows through the window. For example, when calculating a running total, row 1 sees row 1. Row 2 sees rows 1 and 2. Row 3 see rows 1, 2, and 3. The syntax for framing is a bit complex, so I am going to save it for my next post. By default, the frame consists of the first row of the sorted partition and all subsequent rows up to the current row.  Figure 3 is an example. The small arrows represent the current row where the calculation is being performed.

101 3 SQL SERVER   What is the OVER Clause?   Notes from the Field #101

Figure 3: The results of framing

To see the OVER clause in action, run this query against the AdventureWorks database:

SELECT CustomerID, SalesOrderID, TotalDue,
ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY SalesOrderID) AS RowNum,
SUM(TotalDue) OVER() AS GrandTotal,
SUM(TotalDue) OVER(PARTITION BY CustomerID) AS SubTotal,
SUM(TotalDue) OVER(PARTITION BY CustomerID ORDER BY SalesOrderID) AS RunningTotal,
LAG(SalesOrderID) OVER(PARTITION BY CustomerID ORDER BY SalesOrderID) AS PrevOrder,
FIRST_VALUE(TotalDue) OVER(PARTITION BY CustomerID ORDER BY SalesOrderID) AS FirstAmt
FROM Sales.SalesOrderHeader;

notes 82 3 SQL SERVER   What is the OVER Clause?   Notes from the Field #101Next time, I’ll provide a deep dive into framing. If you would like to learn more about T-SQL window functions, be sure to check out my latest book Expert T-SQL Window Functions in SQL Server or my Pluralsight course T-SQL Window Functions.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

The Milestone- Linchpin People – Notes from the Field #100

[Note from Pinal]: This is a 100th episode of Notes from the Field series.

This is amazing and shows the true dedication. Almost two years ago, Team Linchpin and I discussed that we want to share something different. When we started none of us know what we actually wanted to build. We only knew one thing – we want to bring real world scenarios and real world examples in front of the world. We do not want to talk about situations which are so special that it will never happen, we wanted to bring the vast experience of the Linchpin People together we situations which DBA and Developers face every single day. When we started, we were not sure if this will be a successful experiment or not. However, when I look back in last two years, I think the amount of content which we have built it is amazing. There are over 100 different stories which DBA and Developers every day come across and Team Linchpin People help them improve their lives.

I indeed feel emotional when I see this 100th blog post from Linchpin People. Their experience with SQL Server is amazing, but their discipline and zeal to help community overall is astonishing.

mikewalsh The Milestone  Linchpin People   Notes from the Field #100

In this episode of the Notes from the Field series database expert Mike Walsh presents his thoughts about this entire experience of Guest Blogging with real world situations. Trust me, you want to read it!


Hello friends at SQLAuthority. I hope that Pinal allows me to share some off topic thoughts with you in this week’s “Notes from the Field” guest post from Linchpin People. This will be, if my math is correct, the 100th guest post that Linchpin has shared on this blog that Pinal has nurtured for the SQL Server community.

These thoughts will be a combination of editorial question & random observation, a preview of what’s in store and a thanks to Pinal. And I’ll try and keep it short.

How Can I Better Give Back to Our World? How Can You?

I’m not going to get to the full heart of this question in a post I’m trying to keep short. I’m going to play with this question more on my blog (www.straightpathsql.com) and on Linchpin’s Blog (www.linchpinpeople.com) in the coming weeks, or months, maybe years.

100 1 The Milestone  Linchpin People   Notes from the Field #100Here’s what I mean though. We live in interesting times. Our world is connected in ways it has never been. Information is being shared in ways it has never been shared. The Pinal’s audience is such a great example of these facts. You are reading this post from India, the Americas, Africa, Asia, Europe, and Australia. At the same time you are reading this post from a city with luxuries and amenities and fast internet someone is reading this in a remote village or rural outpost, perhaps with an intermittent satellite connection. Some are using a translation app to read these thoughts (and I hope they come out the same as I intended!)

We are not as distant as we once were. We are not as far apart as we once were. In many parts of the world we are advancing in many ways. The disease is receding, poverty decreasing, hunger and thirst accounting for less suffering and death. Yet, it isn’t widespread. Words like “Facebook” or “iPhone” are quickly becoming widespread. This access to data and connectivity is moving forward faster than anyone imagined. The speed at which the move is happening is faster than expected. The world’s challenges are not going away at that same rate, though. Perhaps I should say they haven’t been.

Something is changing, though. Things that so many reading this post (I include myself there) take for granted are still out of reach for populations that are too large. Clean water, education, shelter, opportunity, equal access to education for young boys and young girls, freedom from oppression or persecution, or access to basic medicine… These things is still wanted by many in our world. That’s not the change, that’s the steady state. The change, though, is a new ethos. A new ethic. A new mindset. A new action. We live in a time when some of us who have access to all those, dare I call them human rights, are combining our technological know-how, our data, our social media, our tools with the freedoms we have to use them and initiatives are starting everywhere. In this generation, the world’s problems feel closer than ever. Social media is full of stories. The horrors of war are personalized. The faces of children dying from a lack of clean water are seen. The knowledge of repressive systems that seek to rob education from women are no longer hidden. The reality that we all share this one earth and its environment is apparent to more now than it ever has been.100 3 The Milestone  Linchpin People   Notes from the Field #100

If you look around at what many high profile technologists you’ll see diverse answers to these problems starting up. The inventors, innovators, tech leading companies are spending their time, money and platforms to do something. Clean water throughout the world is possible to achieve in this generation – inventors like Dean Kamen are spending time and money on it. Diseases are being eradicated through foundations like the Bill and Melinda Gates foundation. AIDS is being fought. Education with open and free world class education at Khan Academy (sponsored and helped along by so many tech giants) is spreading. Women in societies where education isn’t available to them are receiving educations in secret.

This generation is doing a lot. Yet. Many of us fall into complacency. As I type this, I am on a flight back from a client visit. I was upgraded to first class. I was frustrated that it was an older plane on my airline – without the newer, more comfortable seats… Tomorrow I’ll wake up and get buried in many e-mails and the quest to succeed and keep customers happy. I’ll get into a fight about American politics on Facebook.

100 2 The Milestone  Linchpin People   Notes from the Field #100

This technology that we are a part of growing and creating and supporting. It can do so much. We, who are blessed by education, location, and opportunity can also work together to change this sphere we live on. How? That’s up to you. That’s up to me. But I think if we look inside ourselves. If we look at our passions, if we see which TED talks makes us well up with emotion – we can find areas of focus. If we think long enough about it – we can use our creativity, our innovation, and our situations to help promote these changes. That’s what I want to leave you with here – I’ve already used enough words. Think about it. Think about which “big challenge” moves you to emotions. And ask yourself how you can instead be moved to action. In our global economy, and global connectivity doesn’t have to be all bad news. Let’s use it to change the state of the world. Let’s use it to responsibly rise the standing of everyone. I’m not saying profit is bad, or working hard is bad. I’m saying we have an amazing opportunity to change things from the bottom up as individuals. Perhaps more than any government mandate, program or declaration could ever do. More on this later on my blogs. It’s a question we’re asking at Linchpin People. It’s a question I’m asking myself. It’s a question I hope more of us ask.

What’s Coming In Our Posts Here?

100 4 The Milestone  Linchpin People   Notes from the Field #100Probably not more off topic posts. At least not a lot. Our team is growing at Linchpin. We’ve enlisted more new faces to blog here and we’ve asked the same faces to blog. Expect content from all angles of the Data Platform at Microsoft. Expect to hear from Linchpin People you’ve not yet read here like Bill Anton (about Business Intelligence and Warehousing and ETL), Ryan Adams (maybe about PASS, definitely about the Relational Engine), Mike Lawell or many others on our team. I’m looking forward to seeing what they have to write. I suspect it will be a month or two before you hear from me again here! That’s how big the lineup will be. Looking forward to sharing.

A Thanks to Pinal

I just wanted to reiterate how much I appreciate this site. Pinal’s tenacity, consistency and desire to cover simple topics and complex topics is a great example of someone giving back. I’ve had quite a few chats in person with my friend Pinal, and it’s clear to me that he loves this SQL Server community. He loves helping and sharing. And I don’t see any sign of him stopping. Thanks Pinal. Thanks for sharing. Thanks for giving back. And thanks for this opportunity to randomly share on your blog.

Here’s to another 100 posts!

If you want to get started with performance analytics and triage of virtualized SQL Servers with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Database Testing and Waitfor Delay – Notes from the Field #099

[Note from Pinal]: This is a 99th episode of Notes from the Field series. In this episode we are going to learn something very simple but effective about database testing. Database testing is one of the most critical elements for any developer or DBA. If you do not test your code, there are good chances to make mistakes. However, sometime testing requires some advanced tricks.

In this episode of the Notes from the Field series database expert Kevin Hazzard explains the how to use command Waitfor Delay for our advantage when doing database testing. Read the experience of Kevin in his own words.


KevinHazzard SQL SERVER   Database Testing and Waitfor Delay   Notes from the Field #099

Database testing has become something of an obsession for us at Linchpin People. To test ETL processes, I often write mocking procedures that can simulate data flowing in from clients. When I first began doing this years ago, I’d set up SQL Agent jobs to invoke the mocking procedures on a regular frequency. However, that’s fairly cumbersome to set up so I started writing scripts like this instead:

WAITFOR DELAY '00:00:03';
EXEC sp_InsertMockTransaction;
GO 100

This code uses the WAITFOR command which many developers already understand. If you’ve never seen or heard of WAITFOR before, it’s fairly straightforward. In the code shown above, the DELAY option is used to induce a delay of three seconds. Afterwards, the mocking procedure runs.

The last line of the script is not as familiar to developers, I’ve found. We’ve all seen and used the GO phrase but as it turns out, it’s not part of the T-SQL language. GO is a batch separator that tells SSMS or the SQLCMD processor to forward the current batch of commands to an instance of SQL Server. What many developers don’t realize is that the GO batch separator can accept a positive integer parameter that will loop over the batch that many times.

In SSMS, the code shown above will invoke the mocking procedure 100 times, pausing for three seconds in between. This is a simple but effective way to simulate the input I’ll need for my tests. Of course, this pattern comes in handy for all sorts of uses, not just testing. Hopefully you’ll find simple trick as useful as I have.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Three Simple Guidelines for System Maintenance – Notes from the Field #098

[Note from Pinal]: This is a 98th episode of Notes from the Field series. Maintenance of the database is a very critical activity and I have always seen DBA taking it very seriously. There is a only one problem – there is no single solution or guidance for how to maintain the database. Everybody has their own opinion and way to do different tasks. System maintenance is very straight forward task but I have seen quite often experts even getting confused with the same. Many only focus on index maintenance, statistics maintenance and a few other tasks are common, but understanding the real philosophy of this task is something beyond indexes. When I asked my good friend Stuart about system maintenance, he came up with very interesting guidelines for system maintenance.

In this episode of the Notes from the Field series database expert Stuart Ainsworth explains about three simple guidelines for system maintenance.


 SQL SERVER   Three Simple Guidelines for System Maintenance   Notes from the Field #098

This week, I was helping a client review their backup and disaster recovery strategy when they asked me how to handle an ongoing maintenance issue; it wasn’t really a SQL issue, but they were concerned about what to do for a virtualized server running an older, critical website. They’re in the process of replacing the site, but they have to maintain the machine for at least another 6 months.

A recent patch had crippled the machine, and they had to restore the VM from backup, which took the site down for a few hours. Thankfully, their backups worked, but a retrospective look at what happened showed some flaws in their process:

  1. They knew that the server was critical AND having issues before they applied the patch, and
  2. Their system admin was inexperienced with this particular type of restore, and that cost them some time during their recovery.

At Linchpin People, we usually get asked questions like this as part of the WellDBA exam. Here’s my short list of guidelines for system maintenance; they can be applied to more than just backups.

Automate what you can

Most modern system tools will allow you to do some form of automatic maintenance; SQL Server runs jobs on SQL Agent, Windows Server has Task Scheduler, and Linux has cron. Lots of third-party tools offer not only backup capabilities, but also restore verification capabilities and reporting. The security and time saved by using a tool are usually worth the investment; it frees your biggest expense (your people) up in order to focus on other things.

maintenance SQL SERVER   Three Simple Guidelines for System Maintenance   Notes from the Field #098If you must have a manual process, make it simple, and document it

Sometimes you can’t escape manual processes. For this client, the critical nature of this web site meant that they didn’t want to have an automatic reboot cycle, so while patches got pushed automatically, they rebooted manually. Given the fragility of their recent experience, they’ve decided to stop automatically patching, and doing the following steps each month:

  • Snapshot the server (for quicker restores).
  • Patch and reboot.
  • If patch is unsuccessful, restore from snapshot.
  • If patch is successful, delete the snapshot.

Simple enough. However, my suggestion was that they write those steps down. Operating from a checklist ensures that procedures are followed consistently, elevating the process to near automatic status. It also gives you an opportunity to periodically review manual processes, and automate pieces as situations and technology changes.

Validate, validate, validate…

So, if you’re automating your processes as much as you can, and simplifying your manual processes, what are your engineers spending their time on? They should be periodically validating that the maintenance plans you implemented are successfully working, including routinely doing recovery drills. Just like a fire drill, IT professionals need to practice what they would do in case of a recovery scenario in order to minimize the downtime associated with an actual recovery.

This is one of those activities that is well received in theory, but rarely done in reality. Support queues are always full, new projects and feature requests are always pressing, and maintenance issues are usually at the bottom of the pile. However, in the event of an outage, routines that have been well practiced are the ones that are the fastest to recover; a crisis situation is the absolute worst time for an engineer to feel like they don’t understand what to do.

Summary

Routine maintenance isn’t difficult, but it can be time consuming. To minimize time, automate what you can, and simplify your manual processes. Then, invest your time in preparing for the inevitable. Technology breaks; don’t let it break you.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

Hey DBA – Do You Study Mistakes? – Notes from the Field #097

[Note from Pinal]: This is a 97th episode of Notes from the Field series. If you know Mike Walsh, he is an amazing person to meet and know. I am fortunate to be his friend. Every single time I meet him, I learn something as a human.

mikewalsh Hey DBA   Do You Study Mistakes?   Notes from the Field #097

In this episode of the Notes from the Field series database expert Mike Walsh gives a challenge to all of us. He explains to us why we should study our mistakes. He is indeed very right that every mistake gives us opportunity to correct ourselves.


So another post from me on my friend Pinal’s blog that is on the periphery of technology and professional development.

This post is admittedly inspired by a scare I had with a chainsaw this past weekend. I blogged about that over on my personal site but I wanted to talk about a mindset of studying the failures (both your own and those of others) and trying to always learn from mistakes. You can learn any technology you want, you can be the best DBA you could possibly be – but if you don’t study and learn from failures, you’ll never be as good as you could be.

mistake1 Hey DBA   Do You Study Mistakes?   Notes from the Field #097I can talk about this a bit for two reasons. One is – I’m a human. And as one, I make mistakes from time to time. So I’m qualified to talk about failure – because I’ve done it so much. The other is – I’m a consultant. At Linchpin People – we do a lot of things with SQL Server for our clients. One of them is rescuing projects gone bad. I’ve picked up other people’s mistakes and tried to piece them together for them, and do it regularly.

So on to a few thoughts about learning from mistakes.

What Do You Mean, “Study Mistakes”?

In the post on my own blog, I talked about how chainsaw protective pants/chaps saved me a really bad day. I didn’t always wear safety gear when using chainsaws until recently. I’m not sure why, but I didn’t. What made me change my mind this year? I studied mistakes.

You see – I have to do a lot of work at my house in the woods for clearing fields for farm animals (In my spare time, when I’m not helping clients tune million dollar systems, I’m cleaning up after sheep and chickens – and more animals once I’m done clearing more space) this year. And I’ve used a chainsaw before here and there – but never with the size trees that I’ve had to cut this year. Never at the frequency that I’ve been cutting.  So I started talking to people who’ve been there and done that.  I asked questions and listened to advice. But I also read accident reports (I speak on aviation disasters and learning from them, In the Fire Service we study Line of Duty Deaths a lot to see what mistakes others made. It sounds macabre but it is a vital learning tool). I read a lot of accident reports – and I saw mistakes and common patterns in chainsaw users who died or had serious injuries. In a lot of the cases there were nearly always mistakes made, but then a lot of times they were compounded by not having the right safety gear on when those mistakes happened. I learned about being intentional and avoiding mistakes and changed some habits – but I also saw that over half of the accidents wouldn’t have been reported if the right protective gear was on. So I bought it. And wear it. And it works.

We can do that as technologists too.

DBAs – What can you do?

mistake2 Hey DBA   Do You Study Mistakes?   Notes from the Field #097Study Mistakes – Read blog posts. Look at the forums. Look and see what gets people “hurt” with SQL Server. Look at the horror stories about how SANs really can fail. Learn what goes wrong. Go to lessons learned and “you’re doing it wrong” style presentations. When you go to these – don’t get the “this couldn’t happen to me” attitude. Understand this CAN happen to you and ask yourself what you can do differently.

Be Creative – When I was doing my various levels of emergency medical technician training we would often engage in scenario based discussions. We’d think up “what if” situations. We’d think of the time of dispatch (how a call would come in) through the time responding to that type of call, to arriving, and we’d think of various what-if’s along the way. We’d change the presentation of the patient, we’d think about a turn for the worse. All of this training developed muscle memory, it got us thinking about the unknown. So when we were thrown into the unknown, it wasn’t a totally foreign feeling.

We can do that as technologists also! We can think of what could go wrong, we can think of the things a few levels deep and imagine where things can break. And then we can do something about the scenarios that we should deal with. A good way to start this is writing checklists for upgrades or migrations or deployments. As you write the checklist, you can begin to analyze what may happen at various steps and build time and solutions in to deal with those situations.

Be Honest – It’s 2015, so I am assuming that you are a human if you are reading this. So you’ll make mistakes. It will happen. It may be big, it may be small, and it won’t be your last one either. In the moments after you recover you have options. One is ignore what happened, brush it under the rug and hope it doesn’t happen again. Another is blame others. Still another, though, is to investigate what happened. Understand the mechanics. Do a root cause analysis and see what you can do differently next time. Learn from your own mistakes, encourage your team to have open and honest lessons learned meetings where everyone talks, everyone listens and the uncomfortable conversations can be had. If that can’t happen? Then you’re doomed to repeat history.

If you want to get started with performance analytics and triage of virtualized SQL Servers with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – What’s the Difference between ROW_NUMBER, RANK, and DENSE_RANK? – Notes from the Field #096

Kathi SQL SERVER   What’s the Difference between ROW NUMBER, RANK, and DENSE RANK?   Notes from the Field #096[Note from Pinal]: In this episode of the Notes from the Field series database expert Kathi Kellenberger explains about T-SQL Window Functions. Kathi is an amazing instructor, she was the SQL author I have read in my early career. The reason, I love SQL Server because her writing has instigated love for this technology in me. Today she brings a relatively interesting topic for database experts. Read the experience of  Kathi in her own words.


Microsoft introduced the first T-SQL window functions ten years ago with SQL Server 2005!  They introduced more T-SQL window functions with SQL Server 2012. I’ve been on a mission to educate the SQL Server and developer communities about T-SQL window functions since then.

Feedback from my sessions tell me that ROW_NUMBER is the most used T-SQL window function. ROW_NUMBER returns a unique number for every row in the results. (NOTE: If the PARTITION BY option is used, it returns a unique number within each partition.)

Did you know that two other similar functions, RANK and DENSE_RANK, were introduced at the same time? RANK and DENSE_RANK are comparable to ROW_NUMBER, but they handle ties in the ORDER BY expression differently.

To demonstrate the differences, I have chosen a customer from AdventureWorks that placed two orders on the same day. These three functions require an ORDER BY expression in the OVER clause. This tells SQL Server in which order to apply the numbers generated by the window function. In this case, I am using OrderDate, which is not unique.

SELECT CustomerID, SalesOrderID, CAST(OrderDate AS DATE) AS OrderDate,
ROW_NUMBER() OVER(ORDER BY OrderDate) AS [ROW_NUMBER],
RANK() OVER(ORDER BY OrderDate) AS [RANK],
DENSE_RANK() OVER(ORDER BY OrderDate) AS [DENSE_RANK]
FROM Sales.SalesOrderHeader
WHERE CustomerID = 11300;

Take a look at the partial results, starting where the numbers returned by the functions begin to get interesting.

notes 96 SQL SERVER   What’s the Difference between ROW NUMBER, RANK, and DENSE RANK?   Notes from the Field #096

The OrderDate values were unique until the 7th row, and all three functions return 7 on row 7. The 8th order has the same OrderDate as row 7 so it’s a tie. ROW_NUMBER doesn’t care; it continues to provide unique incrementing numbers. RANK and DENSE_RANK both return 7 again because both rows 7 and 8 are ranked the same.

Notice that on row 9, RANK “catches up” with ROW_NUMBER by skipping the value 8. RANK catches up with ROW_NUMBER every time once it’s past the tie. DENSE_RANK, on the other hand, does not. Instead of catching up, it returns the next possible value. DENSE_RANK doesn’t skip any numbers.

Take a look at row 9 again. ROW_NUMBER returns the position of the row. RANK returns the rank of the row based on its position. DENSE_RANK returns the logical rank: the rank over unique OrderDate values. The row with OrderDate 2913-11-14 is in the 9th position. It is ranked 9th over the set of rows. It is ranked 8th over the set of unique OrderDate values.

If you use a unique ORDER BY expression, all three functions will return the same values because there will be no ties.

Conclusion

These three functions are usually a step along the way to a more complex solution. Understanding how they work and how they differ will help you decide which one to use when faced with the choice.

notes 82 3 SQL SERVER   What’s the Difference between ROW NUMBER, RANK, and DENSE RANK?   Notes from the Field #096If you would like to learn more about T-SQL window functions, be sure to check out my latest book Expert T-SQL Window Functions in SQL Server or my Pluralsight course T-SQL Window Functions.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Are Power Options Slowing You Down? – Notes from the Field #095

[Note from Pinal]: This is a 95th episode of Notes from the Fields series. When it is about tuning SQL Server, we always look at the configuration of the SQL Server. However, there are few settings of the operating system can also impact the performance of the SQL Server. I recently asked John Sterrett a very simple question – “Which is the one option from the OS, would you check first when you are tuning SQL Server?”

JohnSterrett SQL SERVER   Are Power Options Slowing You Down?   Notes from the Field #095

In this episode of the Notes from the Field series database expert John Sterrett (Group Principal at Linchpin People) explains a very interesting story about how a simple setting of the OS impacts performance of SQL Server. Read the experience of John in his own words.


Doing several SQL Server health checks I have noticed that the operating system settings can slow you down. Today, we are going to focus on power options. By default many companies have the balanced power plan as the default configured option.  You will see that it is also the recommended setting. This is recommended to save power usage which reduces which reduces the amount of money required to power your servers.  Please keep in mind that your virtualization software or BIOS might also have Power Option settings that should be verified and adjusted. Today, we are focusing on the Windows Operating System.

notes 95 SQL SERVER   Are Power Options Slowing You Down?   Notes from the Field #095

The balanced power option is great for most servers where CPU power is not mission critical to the applications living on your server. SQL Server is an exception and you will want to get as much as you can out of your CPU Power.  In fact Enterprise edition licensing is now licensed by core so you want to make sure you are getting the most out of your SQL Server licensing.

How Do We Script Out The Change?

While you can load the power option GUI and manually change the setting I like to automate as much as possible.  The following script could be executed via PowerShell or Command Prompt to make sure High Performance power option is enabled.

Powercfg -SETACTIVE SCHEME_MIN

Conclusion

SQL Server places a different set of demands on the operating system. OS default settings are not always optimal and should be reviewed.

Are your servers running at optimal speed or are you facing any SQL Server Performance Problems? If you want to get started with the help of experts read more over here: Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Rewriting Database History – Notes from the Field #094

[Note from Pinal]: This is a 94th episode of Notes from the Field series. When I read the title of this article – I was extremely intrigued with it – Rewriting Database History! When I was a kid, history was my favorite subject and till today when I have to deal with history, I always jump first to read and enjoy it. When I see this article from Kevin, I was so delighted. I started to read it immediately, and I did not stop reading it till it was finished. It was 20 minutes well spent. The journey starts from ORM and end at the ORM, however, when it comes to full circle there are so many things in between. Trust me Kevin can be a great historical novel writer.

In this episode of the Notes from the Field series database expert Kevin Hazzard explains the interesting subject of rewriting database history. Read the experience of Kevin in his own words.


KevinHazzard SQL SERVER   Rewriting Database History   Notes from the Field #094

Way back in June of 2006, my friend Ted Neward wrote a very detailed piece called The Vietnam of Computer Science. Juxtaposing America’s involvement in Southeast Asia in the last century with the battles we fight in modern, business-driven enterprises, that article examined the unwinnable war that Object-Relational Mapping (ORM) tools have become. The title of the article is probably a bit politically incorrect these days given that Vietnam’s world standing and economy have improved vastly over the last decade. Moreover, lots of database and application developers working today don’t even remember the war in Vietnam so the allegory may fail for them. But for those of us who grew up in the 1960s and 1970s, the writing evokes plenty of painful memories that paint ORMs as technological quagmires.

ORM tools have evolved quite a bit since 2006 but they’re still generally pretty awful. The metadata-driven code generators buried within tools like Hibernate and Microsoft’shistory1 SQL SERVER   Rewriting Database History   Notes from the Field #094 Entity Framework do a much better job of writing decently-performing queries than they did in the early days. But the cruft, wacky idioms and generally non-sensible constraints one must suffer to use these tools is overwhelming for all but a handful of gladiators who gain some odd sense of purpose in buttressing their companies for long-running conflicts. Nowhere is this more apparent than in the Ruby on Rails space where some sort of addiction to total database ignorance seems to have infected the vast majority of that community. If you want to understand why Ruby on Rails is in decline in recent years, look no further than the chatty, miserably-performing queries that the Active Record ORM generates when accessing complex, relational data structures. Any DBA or database developer who has been enlisted to debug performance problems in those sorts of melees knows there’s a better way to win hearts and minds.

For most developers who have shunned traditional ORMs, I often hear them speak about how unwieldy, brittle or inflexible they found the tools to be. The relational abstractions that they provide is sometimes too weak, making common business operations difficult to express in languages like C# and Java. To solve that problem, the abstraction may become leaky, exposing details that create dependencies which complicate deployments or lock you into vendor or version-specific features.

For database people, ORM aversion is typically related to the naiveté of machine-generated queries. Chattiness is a big problem with full-service ORMs, for example. Mature ORMs do a better job today emitting correlated sub-queries or common table expressions than they did in the past. But that often depends on good mappings which depend in turn on good metadata. Unfortunately, perfect database metadata is quite rare. Anyone who watches the tracing of a database server suffering under the garrulous assault of an ORM-driven application understands just how little the ORM really knows about the database.

history2 SQL SERVER   Rewriting Database History   Notes from the Field #094If the protestors can convince management that the ORM war is unwinnable, the response can move in several potential directions. One might rationalize that the relational representation of data is the real foe, i.e. that because data is rarely used in third normal form within applications, it shouldn’t be stored that way on disks. The pundits of the NoSQL movement say that relational storage models are a holdover from an age when storage and bandwidth were much more expensive than they are today. That is a massive oversimplification for sure, but there are some cases where a document-oriented databases make a lot of sense.

For example, I worked in a shop long ago that used a UniVerse database to manage large caches of medical claims data. As a multi-valued database much like today’s NoSQL databases, UniVerse stores data with all the data that it references in a single document. So a patient’s demographic information can be stored in line with the related medical claims, diagnostic history and lab results. For that system, there was only one index on the database which was used to find individual, semi-structured blobs of patient records. The rules engine which sat atop the database always needed all of a single patient’s information whenever it ran a rule so storing the entire document describing a patient as a single database entity was both practical and highly efficient.

In those days, UniVerse was an IBM product that competed against their own DB2 relational database. For some reason, IBM was keen on getting our company to move from UniVerse to DB2 so we invited them in for a proof of concept to build a relational model of our medical database in DB2. When they were done, they loaded several terabytes of data into the new relational database and aimed the rules engine there. After weeks of tweaking, rewriting and hand-wringing, they never could make the relational database outperform UniVerse. Moreover, it wasn’t even a close race. The UniVerse database running on much older hardware could fetch whole patient documents in a few milliseconds. The best we could get DB2 to do using its normalized implementation was measured in hundreds of milliseconds. The difference between five milliseconds and half a second may not sound like a major problem, but when you’re handling millions of queries per day and trying to satisfy a three second, end-to-end Service Level Agreement with your Point of Sale providers, those delays add up very quickly.

Of course, document-oriented databases like Mongo and multi-valued databases like UniVerse are not always viable solutions to bridging the gulf between on-disk and in-memory data history3 SQL SERVER   Rewriting Database History   Notes from the Field #094representations. For 20 years, I’ve made the claim that databases are an unfortunate consequence of history. Bear with me because I don’t mean to disparage database technologies at all. Quite the opposite, I love and respect modern database systems of many types. However, it’s very true that if you could travel back in time to 1943 and give the developers of the ENIAC computer at the University of Pennsylvania a large scale memory device to attach to their invention, databases as we know them would never have evolved. Instead, all of the rich query tools, and consistency features of modern transactional databases would have been developed inside of our applications instead.

In that sense, document-oriented databases and ORMs are both attempts to undo history. The former solution means to eliminate the mismatch between applications and databases altogether by making the data storage model match the common data utilization pattern. The problem is that by coercing the database to adapt to the application’s model, many of the features that evolved in transactional systems simply aren’t available. So-called eventual consistency scares developers who have come to depend on databases that offer transactions which are Atomic, Consistent, Isolated and Durable (ACID). This is most unfortunate because had the ENIAC developers had unlimited in-memory storage available to them, I believe that in-memory data handling semantics would have evolved to be ACID-compliant just as they are in modern relational databases. Document-oriented databases redo history in a way that doesn’t honor so many of the good choices that were made in real history.

Rather than attempting to undo history per se, ORMs try to hide history from us by making it seem that the underlying database’s storage and query semantics exist inside the application’s space. While this is a better way to undo history because it preserves the best parts of it, this is perhaps the hardest problem in all of computer science today. The reasons that it is so difficult are myriad. Incomplete metadata is often the culprit. The cardinality between entities, for example, is nearly impossible to get right simply by inspecting a database’s available foreign keys. Moreover, the relationships between entities in separate databases or in separate but related data integration processes often has no metadata to tie them together. The Master Data Management (MDM) structures and processes used to manage code assignments can also have a huge impact on an ORM’s query generation. Languages like C# and Java lack even the most basic Design by Contract (DbC) features so it’s no wonder that their handling of database constraints is also quite weak. For example, imagine a simple database constraint that limits an integer value to multiples of ten. Now imagine an ORM trying to convey and enforce that constraint in a language like C#. You get the picture. This is a very difficult, effectively impossible, problem to solve.

history4 SQL SERVER   Rewriting Database History   Notes from the Field #094So what’s the solution? Should ORMs be used at all? Should we abandon relational databases and move entirely over to document-oriented databases? Yes and no, not necessarily in that order. ORMs have taught us a lot over the past 20 years. The number one thing I’ve learned from them is to be pragmatic about my approach to data. I use stored procedures whenever there’s a way to reduce the chattiness and excessive round trip calls that ORMs often induce. That keeps my DBAs happy. For simple, table accesses that are indexed well for my common query predicates, I’ll use an ORM just to save me some time.

Nowadays, I rely on so-called micro-ORMs and auto-mapper tools more than traditional ORMs. Micro-ORMs are an admission that full-service ORMs are often too complex or that they unnecessarily constrain my use of data. Using less ambitious micro-ORMs like Dapper gives me the benefits I desire without all the weight and ceremony of a full-featured ORM. Lastly, when document-oriented storage seems warranted, I evaluate the criteria objectively and openly admit when NoSQL is the correct choice. However, this is less common than avid NoSQL proponents would have you believe. Relational databases are still the best choices for storage and data handling in the vast majority of cases, mostly because of their maturity, rich feature sets, high reliability and great support.

Nine years later and my friend Ted is still mostly correct. ORMs are an unwinnable war. But they’ve taught us a lot and there are other options for solving what is perhaps the hardest problem out there today: undoing history and redoing it as we think it might have been done under different circumstances. No matter which way the market goes, the technical equivalent of sober diplomacy should triumph over conflict.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)