SQL SERVER – Common Sense Data Security – Notes from the Field #055

[Note from Pinal]: This is a 55th episode of Notes from the Field series. Common sense is not as much as common as we think. I am sure you agree with it from your real world experience. However, when it is about data and its security, there has to be some rules along with the policy but common sense is extremely critical. When I read this article, I find it humorous at points and some of the examples also reminded me of my past experience. If you are in data security, you will have a great time reading these notes, but if you are not, you will still love it.

In this episode of the Notes from the Field series database expert Kevin Hazzard explains common sense data security and how we can apply in daily life in real world. Read the experience of Reeves in his own words.


There are many excellent books and articles that address the correct ways to store sensitive user information. Yet, many in IT are still failing to protect customers from loss due to data breaches. Every day, it seems that there’s another retailer or commercial web site in the news for losing passwords or credit card numbers to hackers. As an industry, why are we struggling to secure this type of information when there’s so much good intelligence and so many great tools for getting the job done? It’s a complicated subject so perhaps it’s time to step back a bit and use a bit of common sense to analyze the problem.

No matter the industry, using the right tool for the job is rule number one. Line-of-business databases are all about organizing information and getting it into the hands of people who perform transactions and make decisions with it. As a result, these databases become naturally permissive by nature, especially as they evolve to meet the demands of growing businesses. There are good access controls in modern databases but when it comes to managing ultra-secure bits of data, traditional, relational databases may not be the best fit for the job.

Lightweight Directory Access Protocol (LDAP) servers like ApacheDS, OpenLDAP and Microsoft Active Directory do a much better job of handling sensitive data with less trouble than any custom coding we might do on our own. Moreover, the built-in authentication functions of LDAP are mature and standards-based, making them safe and reusable from many different applications without custom interface development. It’s our duty as technologists and as business people to highlight the high cost of custom security solutions and the huge potential risks to our managers. In particular, when it comes to storing passwords in our line-of-business databases, just say no.

If we must manage financial instruments or personally identifying information in a database like SQL Server, there are three classes of problems to solve:

  1. Keeping the hackers from stealing our stuff,
  2. Detecting when breach attempts occur, and
  3. If data is unfortunately lost, making the information useless.

Let’s think about these efforts from a common sense perspective. Problem one is all about access control. The problem with permissions in any complex system is that they are difficult to maintain over time. Even if the initial configuration and policies safeguard the sensitive data, some future administrator may fail to understand or enforce the rules correctly. We could make those future administrators’ jobs much easier if we followed one simple rule: never mix highly-sensitive data in tables containing non-privileged data.

It’s deceptively simple-sounding but in practice, if sensitive data is always segregated into encrypted tables (http://blog.sqlauthority.com/2009/04/28/sql-server-introduction-to-sql-server-encryption-and-symmetric-key-encryption-tutorial-with-script/) and placed into a separate, secure schema requiring elevated access privileges, mistakes concerning permissions will become less likely over time. Moreover, by denying SELECT, INSERT, UPDATE and DELETE privileges on the secured tables, every query can be routed through stored procedures where problems two and three can be addressed with auditing and data obfuscation controls. Lastly, to ensure that lost data is useless, use the new Backup Encryption feature of SQL Server 2014 or invest in a third-party tool that does the same.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

SQL SERVER – Validation Rules: Code or Database? – Notes from the Field #054

[Note from Pinal]: This is a 54th episode of Notes from the Field series. Where do we blame for a mistake or error in the system? Well, developer blames DBA and DBA blame developers. Sometimes there is no solution to the catch 22 situation. I have been there and I am very sure that you have been there too. Well, this is an amazing and beautiful article by G. Andrew Duthie. He has attempted to demystify the problem which we all face every day.

In this episode of the Notes from the Field series database expert Andrew Duthie explains about Validation Rules and where they should be implemented. Read the experience of Andrew in his own words.


 

If you’re a DBA dealing with developers, you may run into the question of whether it’s better to allow the developers to write validation rules in their .NET app layer, or whether you should put your foot down and insist that the validation be implemented in stored procedures. The answer, as you might expect from a consultant, is “it depends.”

Advantages of Code-based Validation

One of the things that might inform your decision on what to use for validation is the skills possessed by the development team. If you’ve got a developer or developers who are well-versed in C# and LINQ, but don’t have a lot of experience writing stored procedures, you may want to cut them a break and let them use the tools they’re more familiar with.

Writing validation rules in code at the application layer allows developers to stay within the realm of .NET objects, which can result in faster development time.

Disadvantages of Code-based Validation

While there are probably more that could be discussed, I’ll mention just two of the significant disadvantages to writing validation rules in code.

First, if the code for the validation rules is using LINQ, particularly if the rules are complex, there’s the possibility of queries that generate sub-optimal SQL under the covers. This can be mitigated by profiling the queries to make sure that any performance hogs are caught as early as possible, but it’s certainly a valid concern.

Second, from a maintainability standpoint, having rules in the app means that adding rules requires the app to be recompiled and redeployed. For some apps and environments, this may not be a big deal, but in others, it could definitely be a deal-breaker.

Advantages of Stored Procedure-based Validation

Using stored procedures for validation provides some key advantages. One is proximity to the data. Unlike code-based validation, which may require pumping significant amounts of data over the wire from the database to the app tier, stored procedure-based validation keeps the logic on the DB tier, so performance may be significantly better.

Another advantage is that with a good execution design (for example, a master stored procedure that executes a list of validation rules in a specified order based on a configuration table), it can be relatively easy to introduce new rules with less disruption than having to recompile and redeploy an entire application.

Disadvantages of Stored Procedure-based Validation

The major disadvantage of using stored procedures for validation, speaking as an app developer, is the basic impedance mismatch between .NET code (C# or Visual Basic) and T-SQL. While it’s certainly possible for developers to master both, there’s a mental cost in switching between these environments, and a potential for mistakes when transitioning from one to the other.

The other downside of stored procedures is the mixing of application logic between the app tier and the database tier. While validation close to the data can, as noted, improve performance, if some parts of the application logic live in both the app and database tiers, this could make for more costly maintenance down the road.

Consistency is Key

One additional point I’d like to make is that it’s probably wise to choose one option or the othernot both. If you have multiple applications in development (or even in maintenance mode), having a mix of app-based or sproc-based validation will likely give you headaches at some point. So get your team together and have a discussion about how you’re currently handling things, and whether there might be a better way.

Summary

The short answer to “which is better” is really “either.” It all depends on the skills of your developers, the performance you need from your app, and the other factors I’ve discussed. Although I’m coming at this from the perspective of an app developer, I’ve recently become more comfortable with the idea of stored procedure-based validation, particularly in instances where more than one app may be targeting the same database, since this can help reduce redundancy, and centralize management of rules.

I’d love to get your feedback on how you’ve handled validation rules in your environment, so feel free to share a comment below.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Using Bitwise And (&) Instead of a Junction Table – Notes from the Field #053

[Note from Pinal]: This is a 53rdth episode of Notes from the Field series. Everyday I get 100s of emails and most of the emails have a similar request. Everyone wants to get maximum performance, but they want to make the least amount of changes in their code. Well, though both of them are contradictory requests, it is possible in most of the cases if you know the technology inside like Linchpin People do. Here in this blog post, my close friend Stuart Ainsworth explains a cool trick, which I just learned today after so many years of experience. Wow, Stuart – thanks for this amazing note from the fields – I learned something new and there will be so many who will enjoy this post.

In this episode of the Notes from the Field series database expert Stuart Ainsworth explains Using Bitwise And (&) Instead of a Junction Table.


 

Bitwise operations in SQL Server are not often used, but like many of the tools available to SQL Server developers, bitwise operations can provide some interesting alternatives when you’re faced with specific challenges. One of my clients has a large database that relies heavily on many-to-many relationships to identify matching interests on multiple characteristics. As a simple example, let’s assume that I’m talking about fruit baskets.

In the simplest version of a fruit basket, you’d have two database objects: the basket, and the assortment of fruit.  Baskets can use different combinations of fruit, and samples of fruit may appear in more than one basket, like so:

Basket 1: Apples
Basket 2: Apples, Bananas
Basket 3: Grapes, Apples
Basket 4: Strawberries, Bananas

The traditional method of modeling this relationship would be to use a junction table, as illustrated below.

 

However, my client  had 500,000 baskets, and roughly 50 different fruits to choose from. Assuming that every basket had at least 10 different fruits, the junction table would have at least 5,000,000 rows of data. Even though the junction table was well indexed and strongly typed, my client’s design was suffering from slow read times.  The client needed an alternative. Enter the bitwise AND (&).

Setting Up a Demo

Let’s set up a demo that illustrates both the junction table method and the bitwise AND alternative.  First, you’ll create the following three tables and populate them (using table valued constructors):

  1. Baskets, which includes a column for use with the Bitwise AND
  2. FruitID, which is set up for use with the Bitwise AND
  3. FruitBaskets, which is a junction table

Note that primary and foreign key references are not included for the simplicity of the demo. You’ll also be adding an extra column to the Baskets table to use for the Bitwise join. Finally, note that the ID column of the Fruit table mirrors the decimal values of the binary bit positions (e.g., 1, 2, 4, 8, 16, 32, 64, 128).

CREATE TABLE Baskets
(
BasketID INT
, BasketName VARCHAR(100)
,
FruitBitHash BIGINT
)
CREATE TABLE Fruit
(
FruitID BIGINT
, FruitName VARCHAR(20)
)
CREATE TABLE FruitBaskets
(
BasketID INT
, FruitID BIGINT
)
GO
INSERT  INTO Fruit
( FruitID, FruitName)
VALUES  ( 1, 'Apples'),
(
2, 'Bananas'),
(
4, 'Grapes'),
(
8, 'Strawberries')
GO
INSERT  INTO dbo.Baskets
( BasketID, BasketName, FruitBitHash)
VALUES  ( 1, 'Apples', 1),
(
2, 'Apples, Bananas', 1 + 2),
(
3, 'Grapes, Apples', 1 + 4),
(
4, 'Strawberries, Bananas', 8 + 2)
GO
INSERT  INTO dbo.FruitBaskets
( BasketID, FruitID)
VALUES  ( 1, 1),
(
2, 1 ),
(
2, 2 ),
(
3, 1 ),
(
3, 4 ),
(
4, 8 ),
(
4, 2 )
GO

Now that you’ve got your tables set up, let’s run a couple of queries. First, you’ll use a junction table (the traditional, normalized model), and then you’ll use the Bitwise AND (&).  In both cases, youy’re looking for baskets that contain apples:

/*Select the fruitbaskets containing Apples using the junction table*/
SELECT BasketID, BasketName
FROM dbo.Baskets b
WHERE EXISTS (SELECT *
FROM dbo.FruitBaskets fb
JOIN dbo.Fruit f ON fb.FruitID = f.FruitID
WHERE b.BasketID = fb.BasketID
AND f.FruitName = 'Apples')
GO
/*Select the fruitbaskets containing Apples using the bithash*/
SELECT BasketID, BasketName
FROM dbo.Baskets b
WHERE EXISTS (SELECT *
FROM dbo.Fruit f
WHERE b.FruitBitHash &amp; f.FruitID <>0
AND f.FruitName = 'Apples')
GO

If you run this demo, you’ll see that you get the exact same results from the two queries. However, the first query would need to read data from 3 tables, and the second query only needs 2. If the junction table is very large, the traditional method can be significantly slower than the second method.

But how does it work? An excellent explanation can be found here, but the short answer is that when you’re using the Bitwise AND (&) to compare two different integers, any value other than 0 that is returned from that comparison means that those integers share a common base. The magic happens with this line of code:

WHERE b.FruitBitHash & f.FruitID <>0

So, why don’t we do this all the time?

There’s an old expression, “If all you have is a hammer, then everything looks like a nail.” Different tools are best suited for different problems. The limitations of using the Bitwise method to remove a junction table include:

  1. Violation of relational integrity: The surrogate IDs in the lookup table (e.g., the Fruit table) have to have a specific order and meaning. If you make a mistake when setting up the key values, you can get wrong answers.
  2. A limited number of bitwise values can be stored in a bigint: In SQL Server, a bigint is 8 bytes, which means that there are 64 bits. When using a single bithash column, you can only have one value per bit. (Note that you can work around this by using multiple columns, but that gets complicated.)

The benefit of the Bitwise AND method is reduced disk I\O because it eliminates a large junction table. In this case, you did notice increased CPU usage using the Bitwise method, but the increase in performance was significant. However, on faster hardware, a junction table would probably have worked as well and still maintained relational integrity. For now, Bitwise AND is a useful tool for a very specific type of problem.

If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Beginning Table Valued Constructors – Notes from the Field #052

[Note from Pinal]: This is a 52th episode of Notes from the Field series. I am very happy that the journey which we started one year ago is a amazing milestone. In this 52 episode in the entire year we have learned a lot of new things from industry experts of LinchPin People. They are amazing sets of people who know what they are doing on the field and in the real world. I have received so many notes from blog readers that they have benefited from the experience shared by LinchPin Team.

In this episode of the Notes from the Field series database expert Kathi Kellenberger explains Table Valued Constructors. Kathi is an amazing instructor, she was the SQL author I have read in my early career. The reason, I love SQL Server because her writing has instigated love for this technology in me. Today she brings a relatively unknown topic for database experts. Read the experience of  Kathi in her own words.


Table Valued Constructors, also called Row Constructors, were introduced with SQL Server 2008, but many SQL Server professionals haven’t heard about them. They make it easy to insert multiple rows of hard-coded values into a table with just one insert statement.  NOTE: In this post, I’ll abbreviate Table Valued Constructors as TVCs.

I’ll often see people using older techniques to populate sample tables when asking for T-SQL help on the forums. The following example demonstrates two techniques.

CREATE TABLE #test(Col1 INT, Col2 VARCHAR(10));
--Method 1, using UNION
INSERT INTO #test(Col1, Col2)
SELECT 1,'a'
UNION ALL
SELECT 2,'b'
UNION ALL
SELECT 3,'c';

--Method 2, multiple select statements
INSERT INTO #test(Col1, Col2)
SELECT 4, 'd';
INSERT INTO #test(Col1, Col2)
VALUES( 5, 'e');

The first method takes advantage of the UNION ALL operator to combine three SELECT queries into one set of results.  The second example uses a separate INSERT statement for each row to be inserted. One statement uses a SELECT with hard-coded values, while the second uses the VALUES keyword.

Beginning with the 2008 version, you can specify multiple rows with the VALUES syntax which is now called TVC. Each set of values must be within parentheses and each set must be separated by a comma. Example 2 demonstrates this technique.

CREATE TABLE #testTVC(Col1 INT, Col2 VARCHAR(10));
INSERT INTO #testTVC(Col1, Col2)
VALUES(1,'A'),(2,'B'),(3,'C'),(4,'D'),(5,'E');

The big advantage of TVCs is that you save typing. Of course, looking cool on the forums is a bonus.

While using the TVC to insert multiple rows with one insert statement is pretty fantastic, there are some other interesting uses. You can create a set of results using a TVC within a derived table. Take a look at this example:

SELECT *
FROM
(VALUES ('January'),('February'),
(
'March'),('April'),('May'),
(
'June'),('July'),('August'),
(
'September'),('October'),
(
'November'),('December')
)
AS Months ([Month]);

In the months example above, my TVC is part of the FROM clause. Notice that I must alias the TVC just like a derived table. I also have to provide column names right after the alias.

Another interesting use of TVCs involves the APPLY operator. A feature of APPLY is that columns from the outer query are visible inside the APPLY. Not only can you have hard-coded values as in the previous example, you can use columns from the outer query as well. The following example demonstrates this technique.

SELECT SOD.SalesOrderID, Dates.*
FROM Sales.SalesOrderHeader AS SOD
CROSS APPLY(VALUES('Order Date',OrderDate),
(
'Due Date',DueDate),
(
'Ship Date',ShipDate)) AS Dates(TypeOfDate,TheDate);

In this example, using the AdventureWorks database, each row from the SalesOrderHeader table is returned three times, once for each kind of date. Instead of displaying each date in its own column. This example “unpivots” the dates.

Table Valued Constructors can save you some typing and make your code look more organized. And, as you have seen here, they are also capable of some neat little tricks.

If you want to get started with BIML with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – The Why of BIML – Notes from the Field #051

[Note from Pinal]: This is a 51th episode of Notes from the Field series. In one of the earlier blog post we discussed about BIML and lots of people asked what is actually BIML and why one should use it. If you have been reading this blog, when it is about BI, I always take help from LinchPin People who are BI experts. I requested Reeves from LinchiPin people to help me answer this unique question.

In this episode of the Notes from the Field series database expert Reeves Smith explains The WHY of BIML. Read the experience of Reeves in his own words.


The Why of Biml

If you question whether Biml can be useful, you are not alone. After I introduce people to Biml, someone always asks, “How can writing something in an XML-based language actually help me out?” I asked this same question when I first started working with Biml. If I’m going to spend the time learning something it had better improve my development experience or bring something else of value to the tool belt. This article will show you how Biml can save you time creating packages.

Overview

As a quick refresher, Biml is an XML language used to describe Business Intelligence (BI) projects. This discussion on Biml will work within the Visual Studio (BIDS/SSDT) environment with BIDS Helper. Using Biml with BIDS and the BIDS Helper add-in allows you to create Integration Services (SSIS) projects. (Note that Biml is compatible with SQL Server Analysis Services projects but not within the BIDS Helper add-in.)

Be aware that the Biml language does not support formatting and layout. The language is built to be simple and effective with an intended design to script and automate.

Example – Source Packages

Why would you need to automate SSIS if all of your packages are different? If you look at most of the packages within your enterprise, some sections of code or complete packages are very similar in design, the only changes being metadata.

Source packages are a typical example of packages that are similar. With most source packages, you do a complete pull of all of the data needed within the tables in the source systems and load that data directly into tables in a stage environment with a similar shape. Most of the time, these packages are very basic and perform a limited number of transformations. Figure 1 illustrates a common source package.

Figure 1 – Common Source Package

If you look into the details of the above package, you will notice that the metadata from package to package is not very different. The differences within the metadata of the above example would include:

SQLT – Truncate Destination Table (Execute SQL Task)

  • SQLStatement: TRUNCATE TABLE TableName

DAFL – Load Destination Table (Data Flow Task)

  • OLE DB Source: Name of the table or view: TableName
  • OLE DB Destination: Name of the table or view: TableName

(Note that these packages are not following the recommended best practices.)

Listing 1 shows what would the above example look like in Biml.

Listing 1 – Common Source Package Biml

Note that the connection strings were truncated to enable the best display for the graphic. A valid connection string with my server settings would be:

provider=SQLOLEDB.1;data source=localhost;database=AdventureWorks2012;integrated security=SSPI

So now that you’ve seen the script needed for a common source package, you may still be asking the original question: Why would you write the above Biml to create five packages?

The best answer might be a chart showing the difference between creating packages manually and creating packages with Biml and then automating them with BimlScript. The chart in Figure 3 assumes it takes about one business day to create one package. Thus one package takes one day, two packages take two days, and so on. With a Biml and BimlScript solution, you have extra development time upfront in creating the Biml code and then adding the BimlScript needed for automation. That investment makes projects with a small number of packages not ideal.

Figure 2 – Package Development Methods

After you create a Biml file, the next step would be to add BimlScript needed for automation. BimlScript is added directly to the Biml file and is able to replace items within the XML.

Practical Use

BimlScript uses C# or Visual Basic as the scripting language. If automation with C# and Visual Basic sounds like another barrier to starting a Biml project, I have another suggestion to enable you to use Biml on your next project. Having to learn both Biml and BimlScript can be an overwhelming requirement to start a Biml project. What if you could defer learning almost any C# and just focus on Biml? The following example is going to demonstrate a method to do that.

If you took the Biml code shown in Listing 1 and added the line of code in Listing 2 to the <Biml> declaration in the script in Listing 1 above, you can add a variable that you could use later in your Biml file. Let’s not worry about anything except that this code creates a variable named tableName, which you’ll use later in the file.

Listing 3 – BimlScript used to create a Variable

This line alone will not accomplish anything if you do not use it within the Biml file. A good test to make sure you declared the BimlScirpt correctly would be to add the code and run Check Biml for Errors. Regenerating a package should still work and nothing will change.

However, if you add one more line of code to four places within the Biml file, you can use and replace that value with the variable declared in Listing 3. The three places within the code are defined by the three red outlines above, and one place is within the package name to create uniquely named SSIS packages.

 

Listing 4 – BimlScript that will use the Variable

The final code is shown below in Listing 4.

Listing 5 – Semi-Automated Common Source Package Biml

If you need to create another package, just update the variable and regenerate the package. This will enable you to create multiple packages within minutes. It is not a perfect solution but adds a great technique on the way to learning Biml.

Note that there are cut-and-paste issues within Visual Studio that will result in extra quotation marks. These are outlined in Listing 5 below. Removing the extra quotes is all you need to do, and you are able to modify the Visual Studio development environment to alleviate this issue. That is discussed here.

Listing 6 – BimlScript Cut-and-Paste Issue

Looking Ahead

This article had touched on a lot, and I hope I’ve given you a reason to consider Biml on your next project. The article used BimlScript, and scripting is the place to focus after you learn Biml. Biml with BimlScript can provide huge savings in development effort once you’ve learned them.

In a future article I will demonstrate how to add more BimlScript to the existing code to increase package automation even further.

Stay Tuned.

Reeves Smith

If you want to get started with BIML with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – Do You Know Your Data’s Classification? – Notes from the Field #050

[Notes from Pinal]: Data is a very simple word, yet it is very powerful. There is a famous saying – Know Your Data. I have quite often found that developers do not know their data, they are often confused with the same and not sure how to answer that. My friend Tim Radney is an amazing person who usually have answer to each of the questions which looks simple but are complicated in reality.

Linchpin People are database coaches and wellness experts for a data driven world. In this 50th episode of the Notes from the Fields series database expert Tim Radney (partner at Linchpin People) explains in a very simple word. Please follow Tim on his twitter handle at @tradney.


Do you know your data’s classification?

As data professionals, we have many responsibilities. We could be responsible for backing up and restoring data, writing reports, building queries, writing stored procedures, tuning workloads, or any vast number of responsibilities.

If you are in the business of granting access to the data either through reports/queries or provisioning login access, you should be aware of the type of data you are granting access to. Your company likely has policies in place that should guide how access to certain classifications of data should be handled.

Some of the more common types of data that your company would have stricter controls over would be related to PII, PCI, SOX, GLBA, or HIPPA. You should know what the guidelines are within your company for access to this data and help make sure that those standards are being upheld.

These data assets may require additional auditing on who has access to view, print, export, etc. When the data is viewed it may require water marks or headers/footers to be on any reports.

Your organization may require certain types of data, such as PCI to be audited on who is viewing certain elements of that data.

Worst, your organization may need to be doing these things but are not. If not, you should be asking why and helping to implement a data governance plan.

If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server.

Note: Tim has also written an excellent book on SQL Backup and Recovery, a must have for everyone.

Reference: Pinal Dave (http://blog.sqlauthority.com)

SQL SERVER – ​Tuning Queries is Sometimes Waste of Time – Notes from the Field #049

[Note from Pinal]: This is a 49th episode of Notes from the Field series. Every day I get few emails where I am asked how to tune queries so the entire server run faster. I always end up writing long answer this question. The reason is simple – query tuning is indeed the need of hours, but it is not everything. There are many more things one should do along with tuning queries. I asked the same question to Brian Moran, who is every day dealing with consultants and various organizations who are facing issues with SQL Server.

In this episode of the Notes from the Field series database expert Brian Moran explains a how Query Tuning is sometimes a waste of time when we are facing issues with performance and scalability. Read the experience of Brian in his own words.


Tuning queries is sometimes a waste of time when you’re trying to fix performance and scalability problems. That sounds crazy but it’s true. Designing an efficient workload is almost always more important than maintaining a laser focus on query tuning and other technical tricks of the DBA trade.

Here’s an example that helps to explain what I mean.

Simple Example

What’s the shortest distance between two points? A straight line, of course. Everyone knows that. Imagine that you need to calculate the most efficient path from Washington DC to New York. Imagine those cities on a traditional 3D globe. The straight line is simple to figure out, right? Just head north as the crow flies.

But what if you wander about and take a curvy path as you head north? DC to New York is about 204 miles as the crow flies, but Google says the distance is 226 miles if you take I95 and the New Jersey Turnpike. Google also presents some reasonable travel options that are as far 264 miles. It’s easy to imagine you could increase the distance even more by taking scenic roads and side trips along the way. Of course, you can also draw a straight line on a globe between DC and New York that heads in a southerly direction, which is over 24,000 miles. That’s over 100 times more expensive that the straight route northwards.

Now, let me map that mapping example back to database performance tuning. Sometimes database performance tuning is an exercise in making the workload more efficient, just as you might want to make your route to New York more efficient. To make the database workload more efficient, you can tune queries, add indexes, and do all sorts of other neat tricks. Query tuning is like an exercise in flattening the northerly path from DC to New York to make as straight a line as possible. You want to tune your queries so that they’re as direct as possible and don’t veer off into unnecessary detours.

But, what if you’re talking about trying to flatten a squiggly line from DC to New York that starts out by heading south 180 degrees, the wrong way?  You can make that line as straight and as efficient as you want.  But heading south from DC to get to New York is never going to be efficient no matter how straight of a line you draw. A route that goes the wrong way can be optimized, but it’s still going the wrong way.

Inefficient Workload

The same idea applies to queries. If you start out tuning an inefficient workload, you’ll end up with an inefficient workload that is tuned, but it is still inefficient.

This seems like a silly example. Everyone intuitively knows that the southerly route is inherently a poor choice. You can tune the route as much as you want, but in best case scenarios it’s still going to be over 100X worse than going north.

Oddly enough, 25 years of database consulting tells me many DBAs will spend most of their time tuning queries that are the database equivalent of trying to make that southerly line from DC to New York as straight as possible. They’ll spend days, weeks, or months, trying to shave that last 10%.

All too often, DBAs fail to take a moment to pause, understand the nature of the problem they are trying to solve, and try to envision a more efficient way to solve the problem.  These DBAs don’t recognize that changing the workload can be vastly better than tuning the workload.

Two Classic Examples

Here are a two classic examples that most DBAs intuitively understand. Say you have a business need defined in a way that requires drop-down list box that will be populated with 100,000 rows that users will have to scroll through. You can try to tune the process to make it more efficient at loading 100,000 rows every time the screen is touched. But that tuning does not change the workload.

Rather than simply tuning the existing workload, a better investment in time would be if you helped the designers of that screen understand and accept a solution that doesn’t require loading 100,000 rows. Or perhaps you have a system that makes heavy use of database cursors in a procedural manner of some kind. Most DBA’s know that architecting with a set-based solution will almost always be better than trying to make the cursor-based approach as fast as possible.

Here’s a 15-year-old example that’s still a common design pattern mistake made today. I was working on an office supply site when e-commerce was first becoming big business. The application was built in a very object-oriented manner. Developers love object-oriented approaches, but databases aren’t always as fond of the approach. In this case this object-oriented design pattern led to 1000 and sometimes 2000 or more round trips to SQL Server when a single user searched for office chairs, and the search brought back just one screen of results.

This was a problem because the system needed to support hundreds or thousands of concurrent users. The existing workload might therefore need to handle hundreds of thousands or millions of batch requests per second. The client wanted to do this on a single SQL Server instance since that’s all their budget could afford.

Summary

I could have spent an infinite amount of time tuning the queries used in this approach, but the workload was never going to be efficient. I was never going to be able to handle hundreds of thousands of batch requests per second on a single instance of SQL Server 2000. Instead, I made some changes to the architecture. In other words, I changed the workload, and I was able to achieve the performance goals I had.

The examples I used here are pretty basic, and most DBA’s today are aware of how to deal with such situations. But I’m willing to bet that many people reading this post are struggling with a performance problem where tuning the queries is a poor use of time as compared to completely changing the essence of the workload. It’s too easy to lose sight of the forest for the trees.

Having a hard time figuring out how to tune your queries to achieve performance goals? Maybe you should take a step back and focus on changes you can make to the workload instead?

If you want to get started with performance tuning and database security with the help of experts, read more over at Fix Your SQL Server.

Reference: Pinal Dave (http://blog.sqlauthority.com)