How to Find Cardinality or Uniqueness for any Column? – Interview Question of the Week #177

Question: How to Find Cardinality or Uniqueness for any Column?

How to Find Cardinality or Uniqueness for any Column? - Interview Question of the Week #177 cardinalityofcolumn1

Answer: The question was actually came up during my training SQL Server Performance Tuning Practical Workshop, while I was explaining various concepts related to cardinality estimation, compatibility level and its impact on SQL Server’s performance.

Though the question is very simple, it is very valid as well. It is not possible that every single person knows the meaning of cardinality.

Let us see a very simple script which gives us cardinality in percentage from the AdventureWorks database table SalesOrderDetail and column SalesOrderID.

SELECT COUNT(DISTINCT SalesOrderID)*100.0/Count(*) 'Distinct_SalesOrderID (in %)'
FROM [AdventureWorks2014].[Sales].[SalesOrderDetail]

If you see the resultsets, it shows how many percentages of unique values are there for the column SalesOrderID details in the table.

How to Find Cardinality or Uniqueness for any Column? - Interview Question of the Week #177 cardinalityofcolumn-800x125

It is very important to understand the uniqueness as well as the datatype of the column and how it is used in SQL Server. They play very important and crucial role in creating indexes as well as building execution plan, however, I have often noticed that many performance tuning experts either do not know about it or do not talk about it during their training.

If you sign up for my SQL Server Performance Tuning Practical Workshop, I make sure that I explain you in detail with a real-world example.

Reference: Pinal Dave (https://blog.sqlauthority.com)

, , ,
Previous Post
How to Enable Files & Folders Level Auditing in SQL Server? – Interview Question of the Week #176
Next Post
How Much Work Each Processor (CPU) is Doing in SQL Server? – Interview Question of the Week #178

Related Posts

1 Comment. Leave new

  • Tom Wickerath
    August 7, 2018 8:31 am

    Hi Pinal,

    I’m having trouble understanding the true meaning of the single cardinality percent number (25.936% in your example). I also don’t have a copy of AdventureWorks handy, so I created a simple example:

    Create database NutriciousFoods;
    go

    Use NutriciousFoods
    go

    create table Foods
    (
    product varchar(50) not null primary key,
    category varchar(50) not null
    );

    create index idxCategory on Foods (category);

    insert into Foods (product, category) values (‘Tomato’, ‘Vegetable’);
    insert into Foods (product, category) values (‘Broccoli’, ‘Vegetable’);
    insert into Foods (product, category) values (‘Celery’, ‘Vegetable’);
    insert into Foods (product, category) values (‘Lettuce’, ‘Vegetable’);
    insert into Foods (product, category) values (‘Spinach’, ‘Vegetable’);
    insert into Foods (product, category) values (‘Arugula’, ‘Vegetable’);
    insert into Foods (product, category) values (‘Apple’, ‘Fruit’);
    insert into Foods (product, category) values (‘Pear’, ‘Fruit’);
    insert into Foods (product, category) values (‘Banana’, ‘Fruit’);
    insert into Foods (product, category) values (‘Grape’, ‘Fruit’);
    go

    — Look at data
    select * from foods
    order by category, product;

    — Determine Cardinality or Uniqueness using Pinal’s formula:
    SELECT COUNT(DISTINCT category)*100.0/Count(*) ‘Distinct_Category (in %)’
    FROM dbo.Foods;

    –> Result = 20%

    Question # 1) When considering the above data, what does this 20% refer to?

    I created the above table in Oracle, insert the same data, and then use the ratio_to_report method, I get values that makes perfect sense:

    select category, (ratio_to_report(count(*)) over() * 100) as “PctTotal”
    from foods
    group by category
    order by category;

    –> Category PctTotal
    Fruit 40%
    Vegetable 60%

    This type of information seems more useful, where one gets a percentage for each unique item. I believe this type of information is useful to help troubleshoot parameter sniffing issues.

    Question # 2) What is the easiest way in SQL Server to get the same type of information that Oracle’s ratio_to_report displays? I was able to create a query that uses two Common Table Expressions, but I’m wondering if there might be an easier way to get the same result:

    with v
    as (
    select distinct category, Cast(count(*) as decimal) As CategoryCount
    FROM dbo.Foods
    group by category
    ),
    t
    as (
    select count(*) as TotalRows FROM dbo.Foods
    )
    select category, (CategoryCount*100 /TotalRows) as PctTotal
    from v, t
    order by PctTotal;

    Thank You,
    Tom

    Reply

Leave a Reply

Menu