Question: How to Find DISTINCT COUNT of Column Without Using Distinct Count Keywords?
Answer: Last week, during an interview we asked this question to 10 interview candidate and only one got it correct. The worst part was that only 4 actually understood what question was. Honestly, if you read this question again, it is a very simple question.
In this blog post, we will see two different methods to find Distinct Count value from a table. The first method is a traditional method and the second method is what is actually answer the question asked to the candidate.
Traditionally Correct Answer
If the question was to find distinct count of column –
SELECTÂ COUNT(InvoiceID) [Total InvoiceID] FROM [WideWorldImporters].[Sales].[InvoiceLines]
Correct Answer to The Question
If the question was to find DISTINCT COUNT of column without using Distinct Count Keywords –
SELECT InvoiceID
FROM [WideWorldImporters].[Sales].[InvoiceLines]
GROUP BY InvoiceID
SELECT @@ROWCOUNT [Total InvoiceID]
GO
Well, here I have used function @@RowCount which temporarily holds the row affected in the previous statement and displays on the screen.
Here are few other relevant blog posts about @@RowCount:
- SQL SERVER – Effect of SET NOCOUNT on @@ROWCOUNT
- SQL SERVER – How to Find Row Count of Every Table in Database Efficiently?
- SQL SERVER – Find Row Count in Table – Find Largest Table in Database – Part 2
- SQL SERVER – Effect of SET NOCOUNT on @@ROWCOUNT
- SQL SERVER – Find Rows and Index Count – SQL in Sixty Seconds #029 – Video
If you have another creative way to answer this question, please post in the comment and I will publish that with the due credit.
Reference:Â Pinal Dave (https://blog.sqlauthority.com)
23 Comments. Leave new
You can also UNION the table with itself to eliminate duplicates.
And of course you could use a temporary table and a cursor.
Nice.
CTE with Row_Number() function was the first way that comes to mind as you are looking for the distinct count. Row_Num will increase if there are more than one invoices, so selecting where the count is 1 will return all distinct invoices.
;WITH test_distinct
AS (
SELECT invoiceid
,ROW_NUMBER() OVER (
PARTITION BY invoiceid ORDER BY id
) AS row_num
FROM [WideWorldImporters].[Sales].[InvoiceLines]
)
SELECT count(invoiceid)
FROM test_distinct
WHERE row_num = 1;
hmm… nice idea. Thanks for sharing.
select sum(x) from (select 1 as x from [WideWorldImporters].[Sales].[InvoiceLines] group by invoiceid) as A
That would seem to be the simpliest that use just standard SQL.
Sum (Select 1)… I am impressed.
Original I had it as “select sum(1) from (select null ….” but decided to go more conventional.
An interesting hypothetical, but why would one ever choose to avoid COUNT DISTINCT? Not for performance reasons, surely.
You never know interviewer’s mental state.
Remember, in a job interview, you are also interviewing the people and company you are going to work for…. (several times I’ve walked away).
Using DENSE_RANK() Function.
select max(InvoiceRowNum) FROM
( select DENSE_RANK()OVER(order by InvoiceID) as InvoiceRowNum from [WideWorldImporters].[Sales].[InvoiceLines] ) Tbl
Wow! Nice.
What about:
Select object_name(spart.object_id) TableName, rows
from sys.partitions spart
Count Distinct excludes NULL values while counting whereas the solution provided in the post using @@ROWCOUNT includes NULL value.
SELECT
TOP 1 [Total InvoiceID] = ROW_NUMBER() OVER (ORDER BY InvoiceID)
FROM [WideWorldImporters].[Sales].[InvoiceLines]
WHERE InvoiceID IS NOT NULL
GROUP BY InvoiceID
ORDER BY 1 DESC
I have a doubt in the “Traditional Answer”
‘SELECT COUNT(InvoiceID) [Total InvoiceID]
FROM [WideWorldImporters].[Sales].[InvoiceLines]’
This would still return the total number of rows in the table right? how will it remove the duplicates?
Adding a ‘distinct’ before the ‘Count’ also returns the total number of records in the table, doesnt distinct. Can someone please explain this to me?
You are correct. That won’t work.
I assume that was a typo, and he intended either:
SELECT COUNT(distinct InvoiceID) [Total InvoiceID]
FROM [WideWorldImporters].[Sales].[InvoiceLines]
or
SELECT COUNT(InvoiceID) [Total InvoiceID]
FROM [WideWorldImporters].[Sales].[InvoiceLines]
GROUP BY InvoiceID
I am a little confused. You said distinct count of columns? Shouldn’t it be rows?
You are correct. It is confusing. Let me fix it.
Also the correct answer is incorrect? That will just give you the row count and not the distinct number of rows. You don’t need to select the @@rowcount because it is already showing in the bottom right hand corner.
Nevermind. I didn’t see the groupBy
if the total records in the table is more than 100 M then above solution will have performance issue.
i would suggest the below solution:
SELECT COUNT(1) FROM
(
SELECT InvoiceID
FROM [WideWorldImporters].[Sales].[InvoiceLines]
GROUP BY InvoiceID
) f