SQL SERVER – Effect of Normalization on Index and Performance

Of late, I have been using Twitter quite frequently, and I am gradually discovering its usefulness. I received a Direct Message (or DM in terms of twitter) asking if I can comment on the effect of normalization on Index and its performance in one twit! Now honestly speaking, this was new for me. I never expected to be quizzed like this. If you are using Twitter, then you must be aware that one twit contains only 140 characters. I was supposed to give answer on such a big subject in just 140 letters. An interesting fact is that normalization and Index are not really closely related. The right question should have been – what is the effect of normalization on performance?

In any case, let us go over my twit answer.

“Normalization increases count of table along with clustered index which makes every single entity with its independent index improving perf.”

Within seconds of my posting the above twit I received a reply from  Mladen Prajdic – “which you loose on joins :)”

This whole scenario is very intriguing. I will try to give a comprehensive explanation. Prior to normalization, usually data is in a large table. As it gets normalized, all the redundant information is removed and data is categorized and placed in their respective tables. It is a common practice to have PK in tables that leads to additional clustered indexes on the database. This means normalized database has more clustered indexes. As there is no redundancy, each table contains a unique entry for data. Because of this all the clustered indexes have fewer rows, which effectively increase the speed of database operations. Besides, as there is no redundancy, normalized database requires less storage and maintenance. Now when query is intricate and involves numerous tables, lots of joins are introduced. These joins effectively diminish performance. However, this loss of performance is less compared to the gain which was a result of normalization.

In case you have skipped the above paragraph and straightaway reading this sentence, I urge you to go back and pay attention to the previous paragraph. This paragraph provides a good explanation to help you understand the logical backstage theory of normalization and performance relations.

I requested Mladen Prajdic for his opinion in one twit and given below is his response. You can follow Mladen Prajdic on  Twitter here.

“@pinaldave Normalization is a data modularization concept. Indexes are data storage and retrieval concept. http://bit.ly/YMJJD” .

Let me know what you think about this subject. If possible, participate here by giving your explanation within 140 characters. Let us see what you all can come up with. Follow me on Twitter.

Reference : Pinal Dave (http://blog.sqlauthority.com)

About these ads

8 thoughts on “SQL SERVER – Effect of Normalization on Index and Performance

  1. Can you enlighten me upon the performance degradation, if any, in case there is a table with about 10 million records, which has some 30 columns, suppose.

    About 15 of the columns reference other tables in the same database. However, upon looking at the indexes, I found that there are about the same number of indexes as the columns.

    How is this going to degrade the performance. We are experiencing a huge time of insertions into this table.

    Follow me on twitter.

    Like

  2. In regards to “However, this loss of performance is less compared to the gain which was a result of normalization.”

    Is there ever a case where this is not true? Maybe some fringe case where the indexes are as large as the data itself?

    Something like a table with Col1 int, Col2 int, Col3 int, Data nvarchar(16) where the 3 (could be more) integers form a primary key. And where the data is hierarchical, i.e. each Col1 has many Col2’s, and each Col2 has many Col3’s, but with each triplet being unique.

    SO you could normalize it into 3 tables Col1, Col2, and Col3 with relationship tables Col1_Col2 and Col2_Col3, but would that be faster and use less disk space?

    Like

  3. I’ve been working on a new normalized database and believe I’ve encountered the issue MLanden was noting and was hoping you could comment on it (trying to figure out the best method to handle this)

    TABLE1:
    RA_NUMBER_ID (FK, int)
    ENTRY_DATE (datetime)
    TEST_RESULTS (int (1/0 for pass/fail)
    SERIAL_NUMBER_ID (FK, int)

    TABLE2: (RA numbers can have multiple tests done to them
    RA_NUMBER_ID (PK, int)
    RA_NUMBER (varchar) (indexed)

    TABLE3: (each RA has 1 set of serials (a 2-pair set), but same serial could exist on multiple RAs)
    SERIAL_NUMBER_ID (PK, int)
    SERIAL_1 (varchar) (indexed)
    SERIAL_2 (varchar) (indexed)

    Using the select:
    SELECT T2.RA_NUMBER, T3.SERIAL_1, T3.SERIAL_2, T1.ENTRY_DATE, T1.TEST_RESULTS
    FROM
    TABLE1 T1 INNER JOIN
    TABLE2 T2 ON T2.RA_NUMBER_ID=T1.RA_NUMBER_ID INNER JOIN
    TABLE3 T3 ON T3.SERIAL_NUMBER_ID=T1.SERIAL_NUMBER_ID
    WHERE
    T3.SERIAL_1=’xxx’
    OR T3.SERIAL_2=’xxx’
    OR T2.RA_NUMBER=’xxx’

    Takes a long time to load the data – it appears the indexes get lost once I join the multiple tables together and attempt to query on different indexed elements.
    1.6 million records in “table 1″ took almost 18 seconds to return 5 rows I was expecting.

    Using this query using ONLy 1 table as the lookup condition is much faster:
    SELECT T2.RA_NUMBER, T3.SERIAL_1, T3.SERIAL_2, T1.ENTRY_DATE, T1.TEST_RESULTS
    FROM
    TABLE1 T1 INNER JOIN
    TABLE2 T2 ON T2.RA_NUMBER_ID=T1.RA_NUMBER_ID INNER JOIN
    TABLE3 T3 ON T3.SERIAL_NUMBER_ID=T1.SERIAL_NUMBER_ID
    WHERE
    T3.SERIAL_1=’xxx’ OR T3.SERIAL_2=’xxx’
    –(removed the “T2.RA_NUMBER” condition)

    However, I do need BOTH T3 and T2 conditions as part of my lookup.
    I am now using a union query to get the results whcih returns that same details as the 1st query above, but does it in milli-seconds time.
    (also utilizes a view to pre-link everything,b ut I cannot use the view to the conditional aspects as it also has a performance hit).
    SELECT VW.RA_NUMBER, VW.SERIAL_1, VW.SERIAL_2, VW.ENTRY_DATE, VW.TEST_RESULTS
    FROM TABLE_VIEW VW
    WHERE VW.TD_RA_OBJID IN
    (
    SELECT T1.TD_RAW_OBJID
    FROM
    dbo.TABLE_1 AS T1 INNER JOIN
    dbo.TABLE_2 AS T2 with(nolock) ON T2.RA_NUMBER_ID=T1.RA_NUMBER_ID
    WHERE
    T2.RA_NUMBER= ‘xxx’
    UNION
    SELECT T1.TD_RAW_OBJID
    FROM
    dbo.TABLE_1 AS T1 INNER JOIN
    dbo.TABLE_3 AS T3 with(nolock) ON T3.SERIAL_NUMBER_ID=T1.SERIAL_NUMBER_ID
    WHERE
    T3.SERIAL_1= ‘xxx’ OR T3.SERIAL_2=’xxx’
    )

    I could put it all back together, but then the indexes on the flat table get out of hand and require frequent re-builds (we average around 1000 entries/minute and the indexes are just not keeping up)

    Like

  4. Pinal
    A very good precis (given the size, it cannot be expected to provide all the technical info to support the conclusions), and thankyou for busting a common myth propagated by the uneducated and those committed to staying that way. With 32 years of working as a database consultant (hundreds of P&T assignments), let me say that I agree with your article.

    There is just one thing. That “joins are costly” is also a myth that begs for busting. Sybase, DBS and MS are over 20 years old, and this particular area has received much engineering attention. The cost of correct joins is insignificant in terms of the generated query tree. The real join cost is in (a) the size of the table being joined and (b) the qualified rows that are joined; the Normalised db produces smaller metrics in both cases. Therefore join cost is substantially lower in Normalised dbs.

    I might add that that narrower rows in Normalised databases allow far more rows to reside in the same memory/cache space. Which leads directly to fewer cache misses and physical I/O. This point works hand-in-glove with the qualification of rows provided by Clustered indices (which you discuss), further improving overall performance.

    Like

  5. sir please help me in my assignment using

    o Identify the strengths and weaknesses of corrected 5NF definition for relational database design
    o Make a critical analysis and formulate a recommendation of each paper.

    Like

  6. Pingback: SQL SERVER – Weekly Series – Memory Lane – #033 | Journey to SQL Authority with Pinal Dave

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s