Every consulting engagement is different and I enjoy interacting with different people while I am working with different experts. Earlier this week, here is what I heard during one of the Comprehensive Database Performance Health Check engagement about Parallelism for Heap Scan.
During a consultation, I have heard two of the DBAs talking about why clustered index is important. Let me write down our conversation here. I am just calling them Mark and John, which are just made up a name for readability.
Mark: Pinal, Do we need a clustered index on this table?
Pinal: Of course, we already have a primary key on the table and as per the workload analysis, I strongly believe we should create a clustered index on it.
Mark: John, didn’t we discussed yesterday that we need to have a clustered index on the table. Additionally, currently there is no clustered index on the table hence, the table is heap and when we retrieve data from Heap, SQL Server does not use the power of parallelism to retrieve data from a heap.
John: Yeah, I get that now. I never knew it before. I will for sure create on a clustered index on the table so we can take advantage of parallelism.
Well, so far everything was going fine, however, as soon as I heard about parallelism conversation, I realized that I must now jump into the conversation and clear up some doubt.
Pinal: Hold on, I need to give some clarification here. Let us for sure create a clustered index on the table as we have already identified a good reason for it based on the workload analysis. However, it has nothing to do with parallelism and heap scan. In SQL Server you can easily see that a heap table is also scanned with multiple processors. Let me show you a demo for a heap scan.
First, let us create a table and populate it with data.
CREATE TABLE TempTable (ID INT, Col1 CHAR(100), Col2 CHAR(100)) GO -- Insert into table INSERT INTO TempTable (ID, Col1, Col2) SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob','Brown' FROM sys.all_objects a CROSS JOIN sys.all_objects b GO
Now enable execution plan by pressing the shortcut CTRL + M or following this blog post. Next run the following query and check the execution plan.
-- SELECT Tests SELECT * FROM #tempTable WHERE ID = 1000 GO
Now go to an execution plan and you will notice that the query is using parallel threads to read the data from the table and indeed doing a table scan.
You can clean up the code by dropping your temporary test table by executing following drop script.
DROP TABLE #TempTable GO
As there is no clustered index on the table it is called heap and it proves that SQL Server can use parallel operations to read the data on the heap as well.
Reference: Pinal Dave (https://blog.sqlauthority.com)