Recently I received a very interesting question from my client who is a recently hired me for Comprehensive Database Performance Health Check. The question was a very simple but interesting one and I have decided to answer it with the proof and working demonstration. Here is the question – What is more optimal for the data load – Temp Table or Regular table? Why?
I am very sure that you might have encountered a similar question in the past. Lots of people who populate the data in ETL often wonder should they use Temporary Tables or Regular tables? The confusion starts because every single time they have to create a temporary table and in the end, it has to be destroyed, whereas the regular table always stays there and does not have to be re-created or dropped (just needs to be truncated).
Here is my answer to this confusion. If you have to load the data and do some transformation, I suggest you use Temp Table and not Regular Table, purely for the performance reason.
Before I go further writing about in this blog post. Let us do two different tests by measuring the statistics TIME.
Test 1: Data Load in Regular Table
Let us run the following script and measure the time of total execution.
SET STATISTICS TIME ON GO USE SQLAuthority GO -- Create Table CREATE TABLE FirstIndex (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Insert One Hundred Thousand Records -- INSERT 1 INSERT INTO FirstIndex (ID,FirstName,LastName,City) SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO DROP TABLE FirstIndex GO
Here is what we can see in the message window.
SQL Server Execution Times:
CPU time = 1454 ms, elapsed time = 1585 ms.
Test 2: Data Load in Temp Table
Let us run the following script and measure the time of total execution.
SET STATISTICS TIME ON GO USE SQLAuthority GO -- Create Table CREATE TABLE #FirstIndex (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- INSERT 1 INSERT INTO #FirstIndex (ID,FirstName,LastName,City) SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO DROP TABLE #FirstIndex GO
Here is what we can see in the message window.
SQL Server Execution Times:
CPU time = 532 ms, elapsed time = 530 ms.
Comparision
It is very clear from the comparison that when we loaded the regular table it took around 1.5 seconds and when we worked with the temporary table it took around 0.5 seconds only. If you do not need your data to be persistent and you need for the temporary transformation, I strongly suggest that you use Temp Tables over regular tables.
The reason, temp tables are faster in loading data as they are created in the tempdb and the logging works very differently for temp tables. All the data modifications are not logged in the log file the way they are logged in the regular table, hence the operation with the Temp tables are faster. In the near future, I will write a detailed blog post explaining this behavior.
Reference: Pinal Dave (https://blog.sqlauthority.com)
7 Comments. Leave new
My lesson: don’t tell this to inexperienced SQL users, otherwise your tempdb could be easily filled up or blocked. :)
It doesn’t depend on experience, it depend on the person. I have seen experienced writing dirty code and inexperienced writes clean code.
What about using Table variable? In this case, it take less time compare to # table.
SET STATISTICS TIME ON
GO
— Create Table
DECLARE @FirstIndex TABLE (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
— INSERT 1
INSERT INTO @FirstIndex (ID,FirstName,LastName,City)
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
‘Bob’,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN ‘Smith’
ELSE ‘Brown’ END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN ‘New York’
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN ‘San Marino’
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN ‘Los Angeles’
ELSE ‘Houston’ END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
At the same time I have noticed that # tables much faster compare to table variables.
How about comparing to an inline view? Either table-based method seems crude coming from an oracle exadata platform.
Hi Pinal,
This is true considering the query you wrote. However, if you use the option WITH (TABLOCK) you will get a considerable performance boost on your query inserting into a regular table. However, how much the boost depends on your Recovery Model. For example, Full Recovery model you can get your regular table speed down to 1 second, as with Simple Recovery Model you can get it to the same speed as the tempdb script, due to minimal logging then being applied to the regular table insert script.
Kind regards,
ONE INTERVIEW QUESITION
TABLE B (INPUT)
————
ID NAME
1 ‘A’
2 ‘B’
1 ‘C’
2 ‘D’
TABLE B (OUTPUT)
————
ID NAME
1 ‘AC’
2 ‘BD’
HOW CAN WE WRITE WITHOUT CASE SATATEMENT IS QUESITION
select id, string_agg(name,”) from TableB group by id