SQL Server Performance tuning is an amazing subject, it never stops to amaze me. Every single day, I take only one Comprehensive Database Performance Health Check and keep the rest of the time open for either learning new stuff or helping people with On Demand (50 Minutes). Recently during the on-demand consultation, I was asked a very interesting question about Stream Aggregate and Hash Aggregate. Let us discuss the same topic today.
Stream Aggregate and Hash Aggregate
Senior DBA recently asked me what is my preference between stream Aggregate and Hash Aggregate and how do I react when I see them in the execution plan.
My answer to this question is very simple and it is – SQL Server knows the BEST in most of the cases. Let us understand my answer a bit more in detail now. Before we continue our discussion, let us understand the definition of the two of the physical aggregate operators.
Stream Aggregate
The stream aggregate is a physical operator to aggregate the data and it expects input rowset must be sorted in order of the grouping columns. When this operator receives the input rowset in the sorted format, it usually performs very efficiently.
Hash Aggregate
The hash aggregate is a physical operator to aggregate the data and it works by creating a hash table and hash buckets like hash join to aggregate the data. This operator is very effective and efficient when the input data rowset is not in the sorted format.
Real World Demonstration
Now let us see a quick demonstration. In the following demonstration, we will see a query that has both the aggregate operators – Stream and Hash. We will try to force our query to use only one kind of aggregate operation and see how it performs.
Here is the query which we will use for our demonstration. Please do not forget to enable the execution plan and also include the command to see statistics IO for our queries. The query is running on the sample database WideWorldImporters.
SET STATISTICS IO ON GO USE WideWorldImporters GO SELECT [CustomerID], COUNT([OrderID]) totOID FROM [Sales].[Invoices] o GROUP BY [CustomerID] GO
Let us see the statistics IO for the query.
Table ‘Worktable’. Scan count 0, logical reads 0
Table ‘Workfile’. Scan count 0, logical reads 0
Table ‘Invoices’. Scan count 9, logical reads 11994
Let us see the execution plan. You will notice that there are both the operators’ Stream and Hash in this simple query.
Now let us for a query hint of Stream Aggregate.
SELECT [CustomerID], COUNT([OrderID]) totOID FROM [Sales].[Invoices] o GROUP BY [CustomerID] OPTION (ORDER GROUP) GO
Let us see the statistics IO for the query. You will notice the page IO reads are the same as the earlier query and there is no difference at all.
Table ‘Worktable’. Scan count 0, logical reads 0
Table ‘Invoices’. Scan count 9, logical reads 11994
Let us see the execution plan. You will notice that there are both the aggregation are using stream aggregate operators.
Now let us for a query hint of Hash Aggregate.
SELECT [CustomerID], COUNT([OrderID]) totOID FROM [Sales].[Invoices] o GROUP BY [CustomerID] OPTION (HASH GROUP) GO
Let us see the statistics IO for the query. You will notice the page IO reads are the same as the earlier query and there is no difference at all.
Table ‘Worktable’. Scan count 0, logical reads 0
Table ‘Workfile’. Scan count 0, logical reads 0
Table ‘Invoices’. Scan count 9, logical reads 11994
Let us see the execution plan. You will notice that there are both the aggregation are using hash aggregate operators.
So essentially, there is no difference in the Page IO read when you look at the queries. I have tested this query under extreme workload tests and the query most of the time is giving us almost the same performance as well.
This tells me that SQL Server had made already the best possible decision when it originally built the execution plan. It had taken the input rowset and output required and build the best possible execution plan.
Summary
While in this example, when I used the query hint, the performance was not much different in both the cases, it is very much possible that if the data changes, your performance start changing when you have used query hints. However, wherever SQL Server Engine has built the execution plan, based on the statistics, it can easily change the aggregate operators and tune the plan accordingly.
Instead of forcing query hints, I would leave the decision of using the aggregate operation to the SQL Server Optimization engine.Â
Sneak Peak
During the consultation with my client, I explained to him that he should not worry about the aggregate operations but rather focus on how we can work on reducing the Statistics IO. We will talk about this in future blog post post.
Reference: Pinal Dave (https://blog.sqlauthority.com)
1 Comment. Leave new
Table Partitioning in SQL Server pe post kren