Which is Faster – INSERT INTO SELECT or SELECT INTO? – Interview Question of the Week #187

Question: Which is Faster – INSERT INTO SELECT or SELECT INTO?

Answer: During the recent Comprehensive Database Performance Health Check, I ended up in a very interesting situation with my client. I realized that my customer had a large data which they wanted to move from one table to another table. However, this was taking a lot of time for them. They keep on asking me if I know any trick which they can use to reduce the insert time. As it was their critical for them we decided to work together and find out the root cause of the issue and resolve it for them.

Which is Faster - INSERT INTO SELECT or SELECT INTO? - Interview Question of the Week #187 balance

After carefully looking into it we realized that to load the large data, my customer was using the syntax of INSERT INTO SELECT. instead of SELECT INTO Though both of the syntaxes looks same. The performance difference between them is very huge. Let us see a quick demonstration for the same.

Set Up for Test

First, let us set up a table with 10 Million rows which we will use for loading into two methods.

USE [WideWorldImporters]
GO
-- SampleTable Build
CREATE TABLE [10MillionRows](
[StockItemTransactionID] [int] NOT NULL,
[StockItemID] [int] NOT NULL,
[TransactionTypeID] [int] NOT NULL,
[CustomerID] [int] NULL,
[InvoiceID] [int] NULL,
[SupplierID] [int] NULL,
[PurchaseOrderID] [int] NULL,
[TransactionOccurredWhen] [datetime2](7) NOT NULL,
[Quantity] [decimal](18, 3) NOT NULL,
[LastEditedBy] [int] NOT NULL,
[LastEditedWhen] [datetime2](7) NOT NULL)
GO
INSERT INTO [10MillionRows]
SELECT TOP 10000000 SI.*
FROM [Warehouse].[StockItemTransactions] SI
CROSS JOIN [Warehouse].[StockItemTransactions] SI1
GO

Now we have a table which has 10 million rows we will start our experiment.

Test 1: INSERT INTO SELECT

In this test, we will test the syntax of INSERT INTO SELECT with the Statistics Time on.

-- Creating Table
CREATE TABLE [InsertIntoTest](
[StockItemTransactionID] [int] NOT NULL,
[StockItemID] [int] NOT NULL,
[TransactionTypeID] [int] NOT NULL,
[CustomerID] [int] NULL,
[InvoiceID] [int] NULL,
[SupplierID] [int] NULL,
[PurchaseOrderID] [int] NULL,
[TransactionOccurredWhen] [datetime2](7) NOT NULL,
[Quantity] [decimal](18, 3) NOT NULL,
[LastEditedBy] [int] NOT NULL,
[LastEditedWhen] [datetime2](7) NOT NULL)
GO
-- Actual Test
SET STATISTICS TIME ON
INSERT INTO [InsertIntoTest]
SELECT *
FROM [10MillionRows]
GO
SET STATISTICS TIME OFF
-- Clean up
DROP TABLE [InsertIntoTest]
GO

Let us see the statistics details of above query.

SQL Server Execution Times:
CPU time = 18937 ms, elapsed time = 19046 ms.

Please note that in the above details, we have not included the time of creating table as that would be negligible compared to the entire insert statement.

Test 2: SELECT INTO

In this test, we will test the syntax of SELECT INTO with the Statistics Time on.

SET STATISTICS TIME ON
SELECT *
INTO [dbo].[SelectIntoTest]
FROM [10MillionRows]
OPTION (MAXDOP 1)
GO
SET STATISTICS TIME OFF

Though the syntax looks very simple, we should not come to conclusion by looking at the syntax, we should only decide this based on the actual time taken by the query. Let us see the statistics details of above query.

SQL Server Execution Times:
CPU time = 6625 ms, elapsed time = 6861 ms.

You can see that the query took very little time compared to Test 2. If you noticed I have used the OPTION (MAXDOP 1) in my query. This is because when I ran the first query it was doing a single CPU query and the second query was doing a parallel execution of the query. To compare apples to apples, I have added MAXDOP into my query.

However, when I ran the same query without MAXDOP 1, I got the following results.

SQL Server Execution Times:
CPU time = 14611 ms, elapsed time = 2517 ms.

You can clearly see that when a query runs in parallel, it just uses a bit more CPU but the query execution time (elapsed time is still quicker than test 1).

Conclusion

If you compare the time of this query with the previous query which we ran in Test 1, you can clearly see that absolutely hands down the winner is Test 2: SELECT INTO. I have run this test multiple times with different datasets and scenarios and I noticed that every time SELECT INTO is faster than INSERT INTO SELECT.

Reference: Pinal Dave (https://blog.SQLAuthority.com)

, ,
Previous Post
What is Lock Pages In Memory? – Interview Question of the Week #186
Next Post
What Part of the SQLDump Suggest Database Corruption? – Interview Question of the Week #188

Related Posts

15 Comments. Leave new

  • Great information. Since last week, I am gathering details about the SQL experience.
    There are some amazing details on your blog which I didn’t know. Thanks.

    Reply
  • This is really interesting as I almost never use select into. Do you have any idea WHY the different syntax does the job so much quicker?
    Nils

    Reply
    • I would hazard that INSERT INTO has overhead because you can specify which fields to use whereas SELECT INTO is optimised to loop over all fields.

      Reply
  • Sir,
    No doubt, SELECT INTO is Faster.
    But it does not copy Table or Column Level Constraints.
    Sohrab

    Reply
  • Like Nils, I always assumed that there would be an overhead creating the table and therefore it would be more efficient to use an existing table. So I dont often use SELECT INTO.

    Even with no Keys or constraints my testing shows it is faster – but why?

    Reply
  • Worked it out I think :) Creating a new table, SQL can assume that no one else is using it so places a table lock.

    The following seems (with my limited testing) to have the same performance as the INSERT INTO.

    INSERT INTO
    {DestinationTable} WITH (TABLOCKX)
    SELECT
    {Field1},{Fields2}….
    FROM
    {SourceTable}

    Can you confirm Pinal?

    Reply
  • Try INSERT into … with (tablock) SELECT …

    SET STATISTICS TIME ON
    INSERT INTO [InsertIntoTest] with (tablock)
    SELECT *
    FROM [10MillionRows]
    GO
    SET STATISTICS TIME OFF

    Reply
  • Select into uses minimal logging.
    Please see the article on msdn site.

    Reply
  • I created a SPROC that inserts into table variables and that method performs faster than select into for me. DECLARE @TEMP_TABLE TABLE performed faster than INSERT INTO #Temp_Table vs SELECT INTO #Temp_Table

    What could that possibly mean?

    Reply
  • SELECT INTO is not supported in MySQL

    Reply
  • Md. Shamshuzzoha
    August 27, 2020 3:40 pm

    Hi Pinal,

    Thanks for sharing, i got same question in my interview yesterday , i have given same answer SELECT INTO but stuck on reason for faster.

    I wanted to know reason behind faster Test2 query.

    Thanks

    Reply

Leave a Reply

Menu