Question: How to Find Cardinality or Uniqueness for any Column?
Answer: The question was actually came up during my training SQL Server Performance Tuning Practical Workshop, while I was explaining various concepts related to cardinality estimation, compatibility level and its impact on SQL Server’s performance.
Though the question is very simple, it is very valid as well. It is not possible that every single person knows the meaning of cardinality.
Let us see a very simple script which gives us cardinality in percentage from the AdventureWorks database table SalesOrderDetail and column SalesOrderID.
SELECT COUNT(DISTINCT SalesOrderID)*100.0/Count(*) 'Distinct_SalesOrderID (in %)' FROM [AdventureWorks2014].[Sales].[SalesOrderDetail]
If you see the resultsets, it shows how many percentages of unique values are there for the column SalesOrderID details in the table.
It is very important to understand the uniqueness as well as the datatype of the column and how it is used in SQL Server. They play very important and crucial role in creating indexes as well as building execution plan, however, I have often noticed that many performance tuning experts either do not know about it or do not talk about it during their training.
If you sign up for my SQL Server Performance Tuning Practical Workshop, I make sure that I explain you in detail with a real-world example.
Reference: Pinal Dave (https://blog.sqlauthority.com)
1 Comment. Leave new
Hi Pinal,
I’m having trouble understanding the true meaning of the single cardinality percent number (25.936% in your example). I also don’t have a copy of AdventureWorks handy, so I created a simple example:
Create database NutriciousFoods;
go
Use NutriciousFoods
go
create table Foods
(
product varchar(50) not null primary key,
category varchar(50) not null
);
create index idxCategory on Foods (category);
insert into Foods (product, category) values (‘Tomato’, ‘Vegetable’);
insert into Foods (product, category) values (‘Broccoli’, ‘Vegetable’);
insert into Foods (product, category) values (‘Celery’, ‘Vegetable’);
insert into Foods (product, category) values (‘Lettuce’, ‘Vegetable’);
insert into Foods (product, category) values (‘Spinach’, ‘Vegetable’);
insert into Foods (product, category) values (‘Arugula’, ‘Vegetable’);
insert into Foods (product, category) values (‘Apple’, ‘Fruit’);
insert into Foods (product, category) values (‘Pear’, ‘Fruit’);
insert into Foods (product, category) values (‘Banana’, ‘Fruit’);
insert into Foods (product, category) values (‘Grape’, ‘Fruit’);
go
— Look at data
select * from foods
order by category, product;
— Determine Cardinality or Uniqueness using Pinal’s formula:
SELECT COUNT(DISTINCT category)*100.0/Count(*) ‘Distinct_Category (in %)’
FROM dbo.Foods;
–> Result = 20%
Question # 1) When considering the above data, what does this 20% refer to?
I created the above table in Oracle, insert the same data, and then use the ratio_to_report method, I get values that makes perfect sense:
select category, (ratio_to_report(count(*)) over() * 100) as “PctTotal”
from foods
group by category
order by category;
–> Category PctTotal
Fruit 40%
Vegetable 60%
This type of information seems more useful, where one gets a percentage for each unique item. I believe this type of information is useful to help troubleshoot parameter sniffing issues.
Question # 2) What is the easiest way in SQL Server to get the same type of information that Oracle’s ratio_to_report displays? I was able to create a query that uses two Common Table Expressions, but I’m wondering if there might be an easier way to get the same result:
with v
as (
select distinct category, Cast(count(*) as decimal) As CategoryCount
FROM dbo.Foods
group by category
),
t
as (
select count(*) as TotalRows FROM dbo.Foods
)
select category, (CategoryCount*100 /TotalRows) as PctTotal
from v, t
order by PctTotal;
Thank You,
Tom