SQL SERVER – Tips for SQL Query Optimization by Analyzing Query Plan

Update: You can download dbForge Studio for SQL Server for free trial.

One of the most exciting periods of my life relates to the maintenance and optimization of ERP system for one large manufacturing company. The problem was that the system was initially created for a limited range of tasks, which over time grew much bigger than expected.

When multiple users simultaneously used the system, working in it was close to impossible. Increasing the operating capacity of a server could not completely solve the problem. So it was settled on revising current business-functionality and optimizing the most resource-consuming SQL queries.

Before I go on with the article, it’s worth mentioning that SQL Server generates an execution plan for each query. The plan describes an algorithm that will lead to a desired result.

We should also keep in mind how execution plans are generated. To reach maximal query execution speed, query optimizer (separate component of the SQL Server core) always attempts to generate a plan with the sequence of actions that consumes minimum server resources.

When evaluating any execution plan, the query optimizer takes into account multiple factors: involved database objects, conditions of joining them, returned columns list, indexes presence, availability of indexes and actual statistics, etc.

However, sometimes the optimizer cannot operate with the actual data and that is why it can over- or underestimate the SQL query cost when executing different plans. That’s why there is always a chance that the non – optimal plan will be chosen.

SSMS environment allows viewing the execution plan with the help of Show Execution Plan. However, analyzing complicated queries in it is far from easy. That is why I use a more functional query profile available in dbForge Studio for SQL Server.

We can get an execution plan each time SQL query execution is started. To do so, we need to switch to the profiling mode.

Or we can use the Generate Execution Plan command without running the query.

Several examples will let me demonstrate some tips for optimizing SQL queries using the profiler:

1. Minimizing the number of implicit connections

For the most complicated SQL queries I seek to use views depending on the situation, since their declaration might be abundant, and the server will be consuming more time for reading and analyzing this data. I do not intend to say that they should not be used at all, but they should be used wisely.

For example, we should get a list of customer’s tables. All the necessary object data can be obtained from the system views.

SELECT *
FROM sys.tables t

We will get the following plan for this query:

In the profiles the most resource-consuming operation are highlighted with red and yellow accordingly, representing the percentage of resources spent on each operation.

Let’s modify our SQL query – now we shall select only table names:

SELECT t.name
FROM sys.tables t

Now our execution plan got simplified and it is obvious that the query execution will require less resources:

How did we achieve that? Let’s take a look at the content of sys.tables. Unfortunately, SSMS does not allow viewing system views DLL, so we will use a small trick:

DECLARE @SQL VARCHAR(MAX)
SELECT @SQL = OBJECT_DEFINITION(OBJECT_ID('sys.tables'))
PRINT @SQL

We receive the following results:

CREATE VIEW sys.tables
AS
SELECT
o.name, o.OBJECT_ID, o.principal_id, o.schema_id, o.parent_object_id,
o.TYPE, o.type_desc, o.create_date, o.modify_date,
o.is_ms_shipped, o.is_published, o.is_schema_published,
ISNULL(ds.indepid, 0) AS lob_data_space_id,
rfs.indepid AS filestream_data_space_id,
...
ts.name
AS lock_escalation_desc,
o.is_filetable
FROM sys.objects$ o
LEFT JOIN sys.sysidxstats lob ON lob.id = o.OBJECT_ID AND lob.indid <= 1
LEFT JOIN sys.syssingleobjrefs ds ON ds.depid = o.OBJECT_ID AND ...
LEFT JOIN sys.syssingleobjrefs rfs ON rfs.depid = o.OBJECT_ID AND ...
LEFT JOIN sys.syspalvalues ts ON ts.class = 'LEOP' AND ...
WHERE o.TYPE = 'U'

Query optimizer looks at the columns which are actually used in our SQL query and removes excess joins. Now you can see why the application of an asterisk in query can result in reducing query efficiency.

However the same data can be received in a more efficient way. All tables are objects and are contained in sys.objects:

DECLARE @SQL VARCHAR(MAX)
SELECT @SQL = OBJECT_DEFINITION(OBJECT_ID('sys.objects))
PRINT @SQL
'

We receive the following in Output:

CREATE VIEW sys.objects AS
SELECT
name,
OBJECT_ID,
principal_id,
schema_id,
...
FROM sys.objects$

It’s obvious that this view is very simple and thus faster than sys.tables. All we have to do is select the tables among all objects by adding filtering condition (U – USER_TABLE, V – VIEW):

SELECT t.name
FROM sys.objects t
WHERE t.[type] = 'U'

To compare the queries we need to select execution plans and click Compare Selected Results on the context menu.

2. Elimination of data re-reading

Usually data re-reading is one of the most resource-demanding operations. That is why queries should be written in a way that would allow reading data in a query only once.
For example:

SELECT
(
SELECT COUNT(*)
FROM sys.objects o
WHERE o.[type] = 'U')
,(
SELECT COUNT(*)
FROM sys.objects o
WHERE o.[type] = 'V')

When you look through the plan, you can notice that the data from the table is being read twice:

The example below shows how this problem can be resolved:

;WITH cte AS
(
SELECT o.[type], [count] = COUNT(*)
FROM sys.objects o
WHERE o.[type] IN ('U', 'V')
GROUP BY o.[type]
)
SELECT (
SELECT [count]
FROM cte
WHERE [type] = 'U'
)
, (
SELECT [count]
FROM cte
WHERE [type] = 'V'
)

Keep in mind that CTE is just a code generalization – it will not allow you to eliminate the data re-read:

Let’s try to rewrite the SQL query using aggregation:

SELECT
COUNT(CASE WHEN o.[type] = 'U' THEN 1 END)
,
COUNT(CASE WHEN o.[type] = 'V' THEN 1 END)
FROM sys.objects o
WHERE o.[type] IN ('U', 'V')

The progress is evident:

But let’s imagine that we need to make more than two aggregations. In this case query execution will consume more time. But in fact, we can simplify it even more:

SELECT *
FROM (
SELECT o.OBJECT_ID, o.[type]
FROM sys.objects o
WHERE o.[type] IN ('U', 'V')
)
t
PIVOT
(
COUNT(t.OBJECT_ID) FOR [type] IN (U, V)
)
p

Let’s compare the queries:

I suppose, no comments are needed.

To sum it up I would like to point out that simplifying SQL query logic is the key to query optimization.
When analyzing execution plans with the help of the query profiler, you will be able to detect bottlenecks in the performance of databases and improve the efficiency of your SQL queries.

I suggest you download dbForge Studio for SQL Server and try this out. Let me know what you think about the same.

Reference: Pinal Dave (http://blog.sqlauthority.com)

About these ads

3 thoughts on “SQL SERVER – Tips for SQL Query Optimization by Analyzing Query Plan

  1. I have table with columns like AutoID, Number, Name, City, State, Country.

    I wanted is the maximum number entered in the “Number” column with the combination of Name, City, State and Country.

    Name City State Country
    Smith Chicago Chicago USA
    John Chicago Chicago USA
    John Los Angelos Chicago USA

    Now smith should get “Number” 1, John 2, and again John(in Los Angelos) 1 as he is the first from Los Angelos.

    I can simply put a where clause in query and get the max number + 1. But the problem is that when I have huge amount of data and the number of users increases, my query will be really slow. I am also inserting data in the same table so it will keep on piling.

    I hope I have made my self clear.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s