How to Find DISTINCT COUNT of Column Rows Without Using Distinct Count Keywords? – Interview Question of the Week #157

January 21, 2018

Question: How to Find DISTINCT COUNT of Column Without Using Distinct Count Keywords?

Answer: Last week, during an interview we asked this question to 10 interview candidate and only one got it correct. The worst part was that only 4 actually understood what question was. Honestly, if you read this question again, it is a very simple question.

How to Find DISTINCT COUNT of Column Rows Without Using Distinct Count Keywords? - Interview Question of the Week #157 distinctcount

In this blog post, we will see two different methods to find Distinct Count value from a table. The first method is a traditional method and the second method is what is actually answer the question asked to the candidate.

Traditionally Correct Answer

If the question was to find distinct count of column –

SELECT COUNT(InvoiceID) [Total InvoiceID]
FROM [WideWorldImporters].[Sales].[InvoiceLines]

Correct Answer to The Question

If the question was to find DISTINCT COUNT of column without using Distinct Count Keywords –

SELECT InvoiceID
FROM [WideWorldImporters].[Sales].[InvoiceLines] GROUP BY InvoiceID
SELECT @@ROWCOUNT [Total InvoiceID] GO

Well, here I have used function @@RowCount which temporarily holds the row affected in the previous statement and displays on the screen.

Here are few other relevant blog posts about @@RowCount:

If you have another creative way to answer this question, please post in the comment and I will publish that with the due credit.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Function, SQL Scripts, SQL Server

How to Get Top N Records Per Group? – Ranking Function – Interview Question of the Week #156

How to Drop All the User Created Statistics by SQL Server? – Interview Question of the Week #158

23 Comments. Leave new

Rick
January 21, 2018 8:11 am
You can also UNION the table with itself to eliminate duplicates.
And of course you could use a temporary table and a cursor.
Reply
- Pinal Dave
  January 25, 2018 7:40 am
  Nice.
  Reply
John Ruddy
January 22, 2018 4:31 pm
CTE with Row_Number() function was the first way that comes to mind as you are looking for the distinct count. Row_Num will increase if there are more than one invoices, so selecting where the count is 1 will return all distinct invoices.
;WITH test_distinct
AS (
SELECT invoiceid
,ROW_NUMBER() OVER (
PARTITION BY invoiceid ORDER BY id
) AS row_num
FROM [WideWorldImporters].[Sales].[InvoiceLines]
)
SELECT count(invoiceid)
FROM test_distinct
WHERE row_num = 1;
Reply
- Pinal Dave
  January 25, 2018 7:40 am
  hmm… nice idea. Thanks for sharing.
  Reply
James Curran
January 23, 2018 2:10 am
select sum(x) from (select 1 as x from [WideWorldImporters].[Sales].[InvoiceLines] group by invoiceid) as A
That would seem to be the simpliest that use just standard SQL.
Reply
- Pinal Dave
  January 25, 2018 7:41 am
  Sum (Select 1)… I am impressed.
  Reply
  - James Curran
    January 25, 2018 9:47 pm
    Original I had it as “select sum(1) from (select null ….” but decided to go more conventional.
Tim Monfries
January 23, 2018 12:55 pm
An interesting hypothetical, but why would one ever choose to avoid COUNT DISTINCT? Not for performance reasons, surely.
Reply
- Pinal Dave
  January 25, 2018 7:39 am
  You never know interviewer’s mental state.
  Reply
  - J
    June 19, 2018 7:12 pm
    Remember, in a job interview, you are also interviewing the people and company you are going to work for…. (several times I’ve walked away).
Pratheek
January 25, 2018 7:29 pm
Using DENSE_RANK() Function.
select max(InvoiceRowNum) FROM
( select DENSE_RANK()OVER(order by InvoiceID) as InvoiceRowNum from [WideWorldImporters].[Sales].[InvoiceLines] ) Tbl
Reply
- Pinal Dave
  January 26, 2018 3:57 am
  Wow! Nice.
  Reply
J
January 26, 2018 2:59 am
What about:
Select object_name(spart.object_id) TableName, rows
from sys.partitions spart
Reply
Bhavana
January 31, 2018 5:38 pm
Count Distinct excludes NULL values while counting whereas the solution provided in the post using @@ROWCOUNT includes NULL value.
Reply
Meckinism
February 12, 2018 7:19 pm
SELECT
TOP 1 [Total InvoiceID] = ROW_NUMBER() OVER (ORDER BY InvoiceID)
FROM [WideWorldImporters].[Sales].[InvoiceLines]
WHERE InvoiceID IS NOT NULL
GROUP BY InvoiceID
ORDER BY 1 DESC
Reply
Zoomie
April 22, 2018 10:37 pm
I have a doubt in the “Traditional Answer”
‘SELECT COUNT(InvoiceID) [Total InvoiceID]
FROM [WideWorldImporters].[Sales].[InvoiceLines]’
This would still return the total number of rows in the table right? how will it remove the duplicates?
Adding a ‘distinct’ before the ‘Count’ also returns the total number of records in the table, doesnt distinct. Can someone please explain this to me?
Reply
- James Curran
  April 23, 2018 10:29 pm
  You are correct. That won’t work.
  I assume that was a typo, and he intended either:
  SELECT COUNT(distinct InvoiceID) [Total InvoiceID]
  FROM [WideWorldImporters].[Sales].[InvoiceLines]
  or
  SELECT COUNT(InvoiceID) [Total InvoiceID]
  FROM [WideWorldImporters].[Sales].[InvoiceLines]
  GROUP BY InvoiceID
  Reply
Kenny
June 19, 2018 5:14 pm
I am a little confused. You said distinct count of columns? Shouldn’t it be rows?
Reply
- Pinal Dave
  June 19, 2018 6:09 pm
  You are correct. It is confusing. Let me fix it.
  Reply
  - kenny
    June 19, 2018 6:41 pm
    Also the correct answer is incorrect? That will just give you the row count and not the distinct number of rows. You don’t need to select the @@rowcount because it is already showing in the bottom right hand corner.
kenny
June 19, 2018 6:55 pm
Nevermind. I didn’t see the groupBy
Reply
Ramkumar Sambandam
June 23, 2018 3:23 am
if the total records in the table is more than 100 M then above solution will have performance issue.
i would suggest the below solution:
SELECT COUNT(1) FROM
(
SELECT InvoiceID
FROM [WideWorldImporters].[Sales].[InvoiceLines]
GROUP BY InvoiceID
) f
Reply