How to Find DISTINCT COUNT of Column Rows Without Using Distinct Count Keywords? – Interview Question of the Week #157

Question: How to Find DISTINCT COUNT of Column Without Using Distinct Count Keywords?

Answer: Last week, during an interview we asked this question to 10 interview candidate and only one got it correct. The worst part was that only 4 actually understood what question was. Honestly, if you read this question again, it is a very simple question.

How to Find DISTINCT COUNT of Column Rows Without Using Distinct Count Keywords? - Interview Question of the Week #157 distinctcount

In this blog post, we will see two different methods to find Distinct Count value from a table. The first method is a traditional method and the second method is what is actually answer the question asked to the candidate.

Traditionally Correct Answer

If the question was to find distinct count of column –

SELECT COUNT(InvoiceID) [Total InvoiceID]
FROM [WideWorldImporters].[Sales].[InvoiceLines]

Correct Answer to The Question

If the question was to find DISTINCT COUNT of column without using Distinct Count Keywords –

SELECT InvoiceID
FROM [WideWorldImporters].[Sales].[InvoiceLines] GROUP BY InvoiceID
SELECT @@ROWCOUNT [Total InvoiceID] GO

Well, here I have used function @@RowCount which temporarily holds the row affected in the previous statement and displays on the screen.

Here are few other relevant blog posts about @@RowCount:

If you have another creative way to answer this question, please post in the comment and I will publish that with the due credit.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Function, SQL Scripts, SQL Server
Previous Post
How to Get Top N Records Per Group? – Ranking Function – Interview Question of the Week #156
Next Post
How to Drop All the User Created Statistics by SQL Server? – Interview Question of the Week #158

Related Posts

23 Comments. Leave new

  • You can also UNION the table with itself to eliminate duplicates.

    And of course you could use a temporary table and a cursor.

    Reply
  • CTE with Row_Number() function was the first way that comes to mind as you are looking for the distinct count. Row_Num will increase if there are more than one invoices, so selecting where the count is 1 will return all distinct invoices.

    ;WITH test_distinct
    AS (
    SELECT invoiceid
    ,ROW_NUMBER() OVER (
    PARTITION BY invoiceid ORDER BY id
    ) AS row_num
    FROM [WideWorldImporters].[Sales].[InvoiceLines]
    )
    SELECT count(invoiceid)
    FROM test_distinct
    WHERE row_num = 1;

    Reply
  • select sum(x) from (select 1 as x from [WideWorldImporters].[Sales].[InvoiceLines] group by invoiceid) as A

    That would seem to be the simpliest that use just standard SQL.

    Reply
  • An interesting hypothetical, but why would one ever choose to avoid COUNT DISTINCT? Not for performance reasons, surely.

    Reply
    • You never know interviewer’s mental state.

      Reply
      • Remember, in a job interview, you are also interviewing the people and company you are going to work for…. (several times I’ve walked away).

  • Using DENSE_RANK() Function.
    select max(InvoiceRowNum) FROM
    ( select DENSE_RANK()OVER(order by InvoiceID) as InvoiceRowNum from [WideWorldImporters].[Sales].[InvoiceLines] ) Tbl

    Reply
  • What about:

    Select object_name(spart.object_id) TableName, rows
    from sys.partitions spart

    Reply
  • Count Distinct excludes NULL values while counting whereas the solution provided in the post using @@ROWCOUNT includes NULL value.

    Reply
  • SELECT
    TOP 1 [Total InvoiceID] = ROW_NUMBER() OVER (ORDER BY InvoiceID)
    FROM [WideWorldImporters].[Sales].[InvoiceLines]
    WHERE InvoiceID IS NOT NULL
    GROUP BY InvoiceID
    ORDER BY 1 DESC

    Reply
  • I have a doubt in the “Traditional Answer”
    ‘SELECT COUNT(InvoiceID) [Total InvoiceID]
    FROM [WideWorldImporters].[Sales].[InvoiceLines]’
    This would still return the total number of rows in the table right? how will it remove the duplicates?
    Adding a ‘distinct’ before the ‘Count’ also returns the total number of records in the table, doesnt distinct. Can someone please explain this to me?

    Reply
    • You are correct. That won’t work.

      I assume that was a typo, and he intended either:

      SELECT COUNT(distinct InvoiceID) [Total InvoiceID]
      FROM [WideWorldImporters].[Sales].[InvoiceLines]

      or

      SELECT COUNT(InvoiceID) [Total InvoiceID]
      FROM [WideWorldImporters].[Sales].[InvoiceLines]
      GROUP BY InvoiceID

      Reply
  • I am a little confused. You said distinct count of columns? Shouldn’t it be rows?

    Reply
    • You are correct. It is confusing. Let me fix it.

      Reply
      • Also the correct answer is incorrect? That will just give you the row count and not the distinct number of rows. You don’t need to select the @@rowcount because it is already showing in the bottom right hand corner.

  • Nevermind. I didn’t see the groupBy

    Reply
  • Ramkumar Sambandam
    June 23, 2018 3:23 am

    if the total records in the table is more than 100 M then above solution will have performance issue.
    i would suggest the below solution:

    SELECT COUNT(1) FROM
    (

    SELECT InvoiceID
    FROM [WideWorldImporters].[Sales].[InvoiceLines]
    GROUP BY InvoiceID
    ) f

    Reply

Leave a Reply