How to Find DISTINCT COUNT of Column Rows Without Using Distinct Count Keywords? – Interview Question of the Week #157

Question: How to Find DISTINCT COUNT of Column Without Using Distinct Count Keywords?

Answer: Last week, during an interview we asked this question to 10 interview candidate and only one got it correct. The worst part was that only 4 actually understood what question was. Honestly, if you read this question again, it is a very simple question.

How to Find DISTINCT COUNT of Column Rows Without Using Distinct Count Keywords? - Interview Question of the Week #157 distinctcount

In this blog post, we will see two different methods to find Distinct Count value from a table. The first method is a traditional method and the second method is what is actually answer the question asked to the candidate.

Solarwinds

Traditionally Correct Answer

If the question was to find distinct count of column –

SELECT COUNT(InvoiceID) [Total InvoiceID]
FROM [WideWorldImporters].[Sales].[InvoiceLines]

Correct Answer to The Question

If the question was to find DISTINCT COUNT of column without using Distinct Count Keywords –

SELECT InvoiceID
FROM [WideWorldImporters].[Sales].[InvoiceLines] GROUP BY InvoiceID
SELECT @@ROWCOUNT [Total InvoiceID] GO

Well, here I have used function @@RowCount which temporarily holds the row affected in the previous statement and displays on the screen.

Here are few other relevant blog posts about @@RowCount:

If you have another creative way to answer this question, please post in the comment and I will publish that with the due credit.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Solarwinds
, ,
Previous Post
How to Get Top N Records Per Group? – Ranking Function – Interview Question of the Week #156
Next Post
How to Drop All the User Created Statistics by SQL Server? – Interview Question of the Week #158

Related Posts

23 Comments. Leave new

  • You can also UNION the table with itself to eliminate duplicates.

    And of course you could use a temporary table and a cursor.

    Reply
  • CTE with Row_Number() function was the first way that comes to mind as you are looking for the distinct count. Row_Num will increase if there are more than one invoices, so selecting where the count is 1 will return all distinct invoices.

    ;WITH test_distinct
    AS (
    SELECT invoiceid
    ,ROW_NUMBER() OVER (
    PARTITION BY invoiceid ORDER BY id
    ) AS row_num
    FROM [WideWorldImporters].[Sales].[InvoiceLines]
    )
    SELECT count(invoiceid)
    FROM test_distinct
    WHERE row_num = 1;

    Reply
  • select sum(x) from (select 1 as x from [WideWorldImporters].[Sales].[InvoiceLines] group by invoiceid) as A

    That would seem to be the simpliest that use just standard SQL.

    Reply
  • An interesting hypothetical, but why would one ever choose to avoid COUNT DISTINCT? Not for performance reasons, surely.

    Reply
    • You never know interviewer’s mental state.

      Reply
      • Remember, in a job interview, you are also interviewing the people and company you are going to work for…. (several times I’ve walked away).

  • Using DENSE_RANK() Function.
    select max(InvoiceRowNum) FROM
    ( select DENSE_RANK()OVER(order by InvoiceID) as InvoiceRowNum from [WideWorldImporters].[Sales].[InvoiceLines] ) Tbl

    Reply
  • What about:

    Select object_name(spart.object_id) TableName, rows
    from sys.partitions spart

    Reply
  • Count Distinct excludes NULL values while counting whereas the solution provided in the post using @@ROWCOUNT includes NULL value.

    Reply
  • SELECT
    TOP 1 [Total InvoiceID] = ROW_NUMBER() OVER (ORDER BY InvoiceID)
    FROM [WideWorldImporters].[Sales].[InvoiceLines]
    WHERE InvoiceID IS NOT NULL
    GROUP BY InvoiceID
    ORDER BY 1 DESC

    Reply
  • I have a doubt in the “Traditional Answer”
    ‘SELECT COUNT(InvoiceID) [Total InvoiceID]
    FROM [WideWorldImporters].[Sales].[InvoiceLines]’
    This would still return the total number of rows in the table right? how will it remove the duplicates?
    Adding a ‘distinct’ before the ‘Count’ also returns the total number of records in the table, doesnt distinct. Can someone please explain this to me?

    Reply
    • You are correct. That won’t work.

      I assume that was a typo, and he intended either:

      SELECT COUNT(distinct InvoiceID) [Total InvoiceID]
      FROM [WideWorldImporters].[Sales].[InvoiceLines]

      or

      SELECT COUNT(InvoiceID) [Total InvoiceID]
      FROM [WideWorldImporters].[Sales].[InvoiceLines]
      GROUP BY InvoiceID

      Reply
    • Create Table TestTable(i int)
      go
      insert into TestTable Values (1)
      go
      insert into TestTable Values (2)
      go
      insert into TestTable Values (2)
      go
      Select count(distinct i) from TestTable
      go
      Select distinct count(i) from TestTable
      go

      Reply
  • I am a little confused. You said distinct count of columns? Shouldn’t it be rows?

    Reply
    • You are correct. It is confusing. Let me fix it.

      Reply
      • Also the correct answer is incorrect? That will just give you the row count and not the distinct number of rows. You don’t need to select the @@rowcount because it is already showing in the bottom right hand corner.

  • Nevermind. I didn’t see the groupBy

    Reply
  • Ramkumar Sambandam
    June 23, 2018 3:23 am

    if the total records in the table is more than 100 M then above solution will have performance issue.
    i would suggest the below solution:

    SELECT COUNT(1) FROM
    (

    SELECT InvoiceID
    FROM [WideWorldImporters].[Sales].[InvoiceLines]
    GROUP BY InvoiceID
    ) f

    Reply

Leave a Reply

Menu