SQL SERVER – Effect of Normalization on Index and Performance

June 13, 2009

Of late, I have been using Twitter quite frequently, and I am gradually discovering its usefulness. I received a Direct Message (or DM in terms of twitter) asking if I can comment on the effect of normalization on Index and its performance in one twit! Now honestly speaking, this was new for me. I never expected to be quizzed like this. If you are using Twitter, then you must be aware that one twit contains only 140 characters. I was supposed to give an answer on such a big subject in just 140 letters. An interesting fact is that normalization and Index are not really closely related. The right question should have been – what is the effect of normalization on performance?

In any case, let us go over my twit answer.

“Normalization increases the count of the table along with clustered index which makes every single entity with its independent index improving perf.”

This whole scenario is very intriguing. I will try to give a comprehensive explanation. Prior to normalization, usually, data is in a large table. As it gets normalized, all the redundant information is removed and data is categorized and placed in their respective tables. It is a common practice to have PK in tables that lead to additional clustered indexes on the database. This means the normalized database has more clustered indexes. As there is no redundancy, each table contains a unique entry for data. Because of this, all the clustered indexes have fewer rows, which effectively increase the speed of database operations. Besides, as there is no redundancy, the normalized database requires less storage and maintenance. Now when the query is intricate and involves numerous tables, lots of joins are introduced. These join effectively diminish performance. However, this loss of performance is less compared to the gain which was a result of normalization.

In case you have skipped the above paragraph and straightaway reading this sentence, I urge you to go back and pay attention to the previous paragraph. This paragraph provides a good explanation to help you understand the logical backstage theory of normalization and performance relations.

“@pinaldave Normalization is a data modularization concept. Indexes are data storage and retrieval concept. ” .

Let me know what you think about this subject. If possible, participate here by giving your explanation within 140 characters. Let us see what you all can come up with. Follow me on Twitter.

Reference : Pinal Dave (https://blog.sqlauthority.com)

Normalization, SQL Index, SQL Server

SQL SERVER – 2008 – Customize Toolbar – Remove Debug Button from Toolbar

SQL SERVER – 2008 – SSMS Feature – Multi-server Queries

7 Comments. Leave new

Soumya
June 13, 2009 10:01 am
Can you enlighten me upon the performance degradation, if any, in case there is a table with about 10 million records, which has some 30 columns, suppose.
About 15 of the columns reference other tables in the same database. However, upon looking at the indexes, I found that there are about the same number of indexes as the columns.
How is this going to degrade the performance. We are experiencing a huge time of insertions into this table.
Follow me on twitter.
Reply
Jared
June 20, 2009 12:10 am
In regards to “However, this loss of performance is less compared to the gain which was a result of normalization.”
Is there ever a case where this is not true? Maybe some fringe case where the indexes are as large as the data itself?
Something like a table with Col1 int, Col2 int, Col3 int, Data nvarchar(16) where the 3 (could be more) integers form a primary key. And where the data is hierarchical, i.e. each Col1 has many Col2’s, and each Col2 has many Col3’s, but with each triplet being unique.
SO you could normalize it into 3 tables Col1, Col2, and Col3 with relationship tables Col1_Col2 and Col2_Col3, but would that be faster and use less disk space?
Reply
Paul
September 3, 2009 1:21 am
I’ve been working on a new normalized database and believe I’ve encountered the issue MLanden was noting and was hoping you could comment on it (trying to figure out the best method to handle this)
TABLE1:
RA_NUMBER_ID (FK, int)
ENTRY_DATE (datetime)
TEST_RESULTS (int (1/0 for pass/fail)
SERIAL_NUMBER_ID (FK, int)
TABLE2: (RA numbers can have multiple tests done to them
RA_NUMBER_ID (PK, int)
RA_NUMBER (varchar) (indexed)
TABLE3: (each RA has 1 set of serials (a 2-pair set), but same serial could exist on multiple RAs)
SERIAL_NUMBER_ID (PK, int)
SERIAL_1 (varchar) (indexed)
SERIAL_2 (varchar) (indexed)
Using the select:
SELECT T2.RA_NUMBER, T3.SERIAL_1, T3.SERIAL_2, T1.ENTRY_DATE, T1.TEST_RESULTS
FROM
TABLE1 T1 INNER JOIN
TABLE2 T2 ON T2.RA_NUMBER_ID=T1.RA_NUMBER_ID INNER JOIN
TABLE3 T3 ON T3.SERIAL_NUMBER_ID=T1.SERIAL_NUMBER_ID
WHERE
T3.SERIAL_1=’xxx’
OR T3.SERIAL_2=’xxx’
OR T2.RA_NUMBER=’xxx’
Takes a long time to load the data – it appears the indexes get lost once I join the multiple tables together and attempt to query on different indexed elements.
1.6 million records in “table 1” took almost 18 seconds to return 5 rows I was expecting.
Using this query using ONLy 1 table as the lookup condition is much faster:
SELECT T2.RA_NUMBER, T3.SERIAL_1, T3.SERIAL_2, T1.ENTRY_DATE, T1.TEST_RESULTS
FROM
TABLE1 T1 INNER JOIN
TABLE2 T2 ON T2.RA_NUMBER_ID=T1.RA_NUMBER_ID INNER JOIN
TABLE3 T3 ON T3.SERIAL_NUMBER_ID=T1.SERIAL_NUMBER_ID
WHERE
T3.SERIAL_1=’xxx’ OR T3.SERIAL_2=’xxx’
–(removed the “T2.RA_NUMBER” condition)
However, I do need BOTH T3 and T2 conditions as part of my lookup.
I am now using a union query to get the results whcih returns that same details as the 1st query above, but does it in milli-seconds time.
(also utilizes a view to pre-link everything,b ut I cannot use the view to the conditional aspects as it also has a performance hit).
SELECT VW.RA_NUMBER, VW.SERIAL_1, VW.SERIAL_2, VW.ENTRY_DATE, VW.TEST_RESULTS
FROM TABLE_VIEW VW
WHERE VW.TD_RA_OBJID IN
(
SELECT T1.TD_RAW_OBJID
FROM
dbo.TABLE_1 AS T1 INNER JOIN
dbo.TABLE_2 AS T2 with(nolock) ON T2.RA_NUMBER_ID=T1.RA_NUMBER_ID
WHERE
T2.RA_NUMBER= ‘xxx’
UNION
SELECT T1.TD_RAW_OBJID
FROM
dbo.TABLE_1 AS T1 INNER JOIN
dbo.TABLE_3 AS T3 with(nolock) ON T3.SERIAL_NUMBER_ID=T1.SERIAL_NUMBER_ID
WHERE
T3.SERIAL_1= ‘xxx’ OR T3.SERIAL_2=’xxx’
)
I could put it all back together, but then the indexes on the flat table get out of hand and require frequent re-builds (we average around 1000 entries/minute and the indexes are just not keeping up)
Reply
Derek Asirvadem
November 8, 2009 7:01 am
Pinal
A very good precis (given the size, it cannot be expected to provide all the technical info to support the conclusions), and thankyou for busting a common myth propagated by the uneducated and those committed to staying that way. With 32 years of working as a database consultant (hundreds of P&T assignments), let me say that I agree with your article.
There is just one thing. That “joins are costly” is also a myth that begs for busting. Sybase, DBS and MS are over 20 years old, and this particular area has received much engineering attention. The cost of correct joins is insignificant in terms of the generated query tree. The real join cost is in (a) the size of the table being joined and (b) the qualified rows that are joined; the Normalised db produces smaller metrics in both cases. Therefore join cost is substantially lower in Normalised dbs.
I might add that that narrower rows in Normalised databases allow far more rows to reside in the same memory/cache space. Which leads directly to fewer cache misses and physical I/O. This point works hand-in-glove with the qualification of rows provided by Clustered indices (which you discuss), further improving overall performance.
Reply
Anand M
May 23, 2011 12:14 pm
Hi Pinal Dave Sir ;
I am Anand from chennai . i want how to increase performance in sqlserver query .
Reply