SQL SERVER – APPROX_COUNT_DISTINCT – Not Always Efficient

One thing I keep on telling my clients that it is not that always recommended to use the latest feature without proper testing. I was recently hired by clients to help them with tuning a query for SQL Server 2019 Comprehensive Database Performance Health Check. While working with a complex query at one point I noticed that it was finding a distinct count of a column. The need for the query was not to find the exact number of a range to create a paging grid. I immediately remember a newly introduced function APPROX_COUNT_DISTINCT and put that into practice. However, after a lot of testing, I figured out that the newly introduced function is not always efficient.

SQL SERVER - APPROX_COUNT_DISTINCT - Not Always Efficient bananaonwall-800x362

APPROX_COUNT_DISTINCT function returns the approximate number of unique non-null values in a group with up to a 2% error rate within a 97% probability. This APPROX_COUNT_DISTINCT() function is designed to give you the approximate aggregated counts more quickly than using the COUNT (DISTINCT) method.

Let us see a quick example of how APPROX_COUNT_DISTINCT works. I will be using Stackoverflow sample database for this query as that is a large database.

SET STATISTICS IO, TIME ON
GO
-- Works a Bit Faster
SELECT COUNT(DISTINCT OwnerUserId)
FROM [StackOverflow2013].[dbo].[Posts]
GO
SELECT APPROX_COUNT_DISTINCT(OwnerUserId)
FROM [StackOverflow2013].[dbo].[Posts]
GO

First, we will see the result of the query:

SQL SERVER - APPROX_COUNT_DISTINCT - Not Always Efficient APPROX_COUNT_DISTINCT1

The next we will see the statistics IO and Time for the query. It is very clear that the overall improvement is very little when looking at the CPU time and Elapsed Time.

Let us not conclude our experiment with this one example, let us apply a WHERE condition and run the same query once again.

-- Not Always A great Idea
SELECT COUNT(DISTINCT AnswerCount)
FROM [StackOverflow2013].[dbo].[Posts]
WHERE CommentCount = 1
GO
SELECT APPROX_COUNT_DISTINCT(AnswerCount)
FROM [StackOverflow2013].[dbo].[Posts]
WHERE CommentCount = 1
GO

Now let us first see the result of the query.

SQL SERVER - APPROX_COUNT_DISTINCT - Not Always Efficient APPROX_COUNT_DISTINCT2

Next, let us evaluate statistics IO and Time for the query. It is very clear that the overall improvement is very little when looking at the CPU time and Elapsed Time.

I have done many different tests with this function but did not find this function improving performance by more than 10% at my client’s place. There have been instances where APPROX_COUNT_DISTINCT has provided a huge difference in number in answer with slower queries as well.

Unless in your own test you find the new function APPROX_COUNT_DISTINCT Always Efficient, you should use it otherwise, avoid it.

Just like Microsoft SQL Server, Oracle also supports this function.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Counter, SQL Function, SQL Scripts, SQL Server, SQL Server 2019
Previous Post
SQL SERVER – Row Mode and Memory Grant Feedback
Next Post
SQL SERVER – List All the Nullable Columns in Database

Related Posts

Leave a Reply