Which is Faster – INSERT INTO SELECT or SELECT INTO? – Interview Question of the Week #187

Question: Which is Faster – INSERT INTO SELECT or SELECT INTO?

Answer: During the recent Comprehensive Database Performance Health Check, I ended up in a very interesting situation with my client. I realized that my customer had a large data which they wanted to move from one table to another table. However, this was taking a lot of time for them. They keep on asking me if I know any trick which they can use to reduce the insert time. As it was their critical for them we decided to work together and find out the root cause of the issue and resolve it for them.

Which is Faster - INSERT INTO SELECT or SELECT INTO? - Interview Question of the Week #187 balance

After carefully looking into it we realized that to load the large data, my customer was using the syntax of INSERT INTO SELECT. instead of SELECT INTO Though both of the syntaxes looks same. The performance difference between them is very huge. Let us see a quick demonstration for the same.

Set Up for Test

First, let us set up a table with 10 Million rows which we will use for loading into two methods.

USE [WideWorldImporters]
GO
-- SampleTable Build
CREATE TABLE [10MillionRows](
[StockItemTransactionID] [int] NOT NULL,
[StockItemID] [int] NOT NULL,
[TransactionTypeID] [int] NOT NULL,
[CustomerID] [int] NULL,
[InvoiceID] [int] NULL,
[SupplierID] [int] NULL,
[PurchaseOrderID] [int] NULL,
[TransactionOccurredWhen] [datetime2](7) NOT NULL,
[Quantity] [decimal](18, 3) NOT NULL,
[LastEditedBy] [int] NOT NULL,
[LastEditedWhen] [datetime2](7) NOT NULL)
GO
INSERT INTO [10MillionRows]
SELECT TOP 10000000 SI.*
FROM [Warehouse].[StockItemTransactions] SI
CROSS JOIN [Warehouse].[StockItemTransactions] SI1
GO

Now we have a table which has 10 million rows we will start our experiment.

Test 1: INSERT INTO SELECT

In this test, we will test the syntax of INSERT INTO SELECT with the Statistics Time on.

-- Creating Table
CREATE TABLE [InsertIntoTest](
[StockItemTransactionID] [int] NOT NULL,
[StockItemID] [int] NOT NULL,
[TransactionTypeID] [int] NOT NULL,
[CustomerID] [int] NULL,
[InvoiceID] [int] NULL,
[SupplierID] [int] NULL,
[PurchaseOrderID] [int] NULL,
[TransactionOccurredWhen] [datetime2](7) NOT NULL,
[Quantity] [decimal](18, 3) NOT NULL,
[LastEditedBy] [int] NOT NULL,
[LastEditedWhen] [datetime2](7) NOT NULL)
GO
-- Actual Test
SET STATISTICS TIME ON
INSERT INTO [InsertIntoTest]
SELECT *
FROM [10MillionRows]
GO
SET STATISTICS TIME OFF
-- Clean up
DROP TABLE [InsertIntoTest]
GO

Let us see the statistics details of above query.

SQL Server Execution Times:
CPU time = 18937 ms, elapsed time = 19046 ms.

Please note that in the above details, we have not included the time of creating table as that would be negligible compared to the entire insert statement.

Test 2: SELECT INTO

In this test, we will test the syntax of SELECT INTO with the Statistics Time on.

SET STATISTICS TIME ON
SELECT *
INTO [dbo].[SelectIntoTest]
FROM [10MillionRows]
OPTION (MAXDOP 1)
GO
SET STATISTICS TIME OFF

Though the syntax looks very simple, we should not come to conclusion by looking at the syntax, we should only decide this based on the actual time taken by the query. Let us see the statistics details of above query.

SQL Server Execution Times:
CPU time = 6625 ms, elapsed time = 6861 ms.

You can see that the query took very little time compared to Test 2. If you noticed I have used the OPTION (MAXDOP 1) in my query. This is because when I ran the first query it was doing a single CPU query and the second query was doing a parallel execution of the query. To compare apples to apples, I have added MAXDOP into my query.

However, when I ran the same query without MAXDOP 1, I got the following results.

SQL Server Execution Times:
CPU time = 14611 ms, elapsed time = 2517 ms.

You can clearly see that when a query runs in parallel, it just uses a bit more CPU but the query execution time (elapsed time is still quicker than test 1).

Conclusion

If you compare the time of this query with the previous query which we ran in Test 1, you can clearly see that absolutely hands down the winner is Test 2: SELECT INTO. I have run this test multiple times with different datasets and scenarios and I noticed that every time SELECT INTO is faster than INSERT INTO SELECT.

Reference: Pinal Dave (https://blog.SQLAuthority.com)

SQL Performance, SQL Scripts, SQL Server
Previous Post
What is Lock Pages In Memory? – Interview Question of the Week #186
Next Post
What Part of the SQLDump Suggest Database Corruption? – Interview Question of the Week #188

Related Posts

Leave a Reply