Sometimes the best of learnings are based on people’s questions. I always feel when questions are asked, more the learnings are. In a recent session for a customer who had implemented SQL Server 2014 InMemory OLTP in their systems, they were getting bad performance even when they were using InMemory OLTP. I was surprised to hear that and I went on to ask more details. What I noticed is that the customer had a very low hash bucket value in their InMemory Table definition.
If you read the article – SQL SERVER – Introduction to SQL Server 2014 In-Memory OLTP you can see how we can define a hash bucket value while defining the table. I have always been recommending customers that the hash bucket values, it needs to be double the number of unique values to avoid any hash collisions and moreover it would be helpful for future growth in mind. Though this was like a rule of thumb recommendation, I always told them the hash collision (two values inside the same hash bucket) is performance deterrent too.
No one challenged me till date on how it is performance damper and if there was a script to show the case. In this case, I thought I needed to showcase the same to the customer asap to drive home the concept. I did the following script which I would like to share it with you.
- Create your database for testing
- Create table1 which has large Hash bucket value
- Insert 1M rows on table1 and track the time
- Create table2 which has smaller Hash bucket value causing hash collisions
- Insert 1M rows on table2 and track the time
- Clean up
Let us first create the database.
USE master; GO -- Create a new database CREATE DATABASE InMem_HashCollision ON PRIMARY (NAME = N'HashCollisions', FILENAME = N'd:\Data\HashCollisions.mdf', SIZE = 64MB, MAXSIZE = UNLIMITED, FILEGROWTH = 16MB), FILEGROUP [XTPFileGroup] CONTAINS MEMORY_OPTIMIZED_DATA (NAME = N'HashCollisionInMemContainer', FILENAME = N'd:\Data\HashCollisionInMemContainer', MAXSIZE = UNLIMITED) LOG ON (NAME = N'HashCollisions_Log', FILENAME = N'd:\Data\HashCollisions.ldf', SIZE = 16MB, MAXSIZE = 2048MB, FILEGROWTH = 8MB) COLLATE Latin1_General_100_BIN2; GO
Let us create our InMemory table and a Native stored procedure which will be used to enter values into this table.
USE InMem_HashCollision; GO -- create a simple XTP table with a bucket size 1048576 CREATE TABLE dbo.InMem_Data1 ( Col1 INT NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1048576), Col2 INT NOT NULL, Col3 INT NOT NULL ) WITH ( MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA ); GO -- create a native compiled stored procedure to insert data CREATE PROCEDURE dbo.InsertInMem_Data1 ( @Rows [int] ) WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = 'us_english' ) DECLARE @counter INT = 0; WHILE @counter < @Rows BEGIN INSERT INTO dbo.InMem_Data1 (Col1, Col2, Col3) VALUES (@counter, @counter, @counter); SET @counter += 1; END END; GO
Let us insert 1M rows by calling the SP and check on the time taken for the same.
-- execute the stored procedure to insert 1000000 rows DECLARE @starttime DATETIME = CURRENT_TIMESTAMP, @endtime DATETIME; EXECUTE dbo.InsertInMem_Data1 1000000; SET @endtime = CURRENT_TIMESTAMP; PRINT CONCAT('The query executed in ', DATEDIFF(millisecond, @starttime, @endtime), ' milliseconds.');
Here is the resultset from the query.
As you can see, we completed this operation in almost 1 second. We will next do the same process by creating a table with lesser hash bucket count and insert the same 1M rows.
CREATE TABLE dbo.InMem_Data2 ( Col1 INT NOT NULL PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1024), Col2 INT NOT NULL, Col3 INT NOT NULL ) WITH ( MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA ); GO -- create a native compiled stored procedure to insert data CREATE PROCEDURE dbo.InsertInMem_Data2 ( @Rows [int] ) WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = 'us_english' ) DECLARE @counter INT = 0; WHILE @counter < @Rows BEGIN INSERT INTO dbo.InMem_Data2 (Col1, Col2, Col3) VALUES (@counter, @counter, @counter); SET @counter += 1; END END; GO -- execute the stored procedure to insert 1000000 rows DECLARE @starttime DATETIME = CURRENT_TIMESTAMP, @endtime DATETIME; EXECUTE dbo.InsertInMem_Data2 1000000; SET @endtime = CURRENT_TIMESTAMP; PRINT CONCAT('The query executed in ', DATEDIFF(millisecond, @starttime, @endtime), ' milliseconds.');
Here is the resultset from the query.
As you can see, the second query took close to 47 seconds which is way more than what we got in the previous values. This is the effect of INSERT with hash collision and you can think about a similar overhead when used for retrieving data too.
As a best practice you can see that some of these settings can have adverse effect when it comes to performance. I would highly recommend you to keep an eye on these settings before implementing the same for the sake of using a new feature introduced inside SQL Server.
Let us now clean up the script:
-- cleanup USE master; GO IF DATABASEPROPERTYEX('InMem_HashCollision', 'Version') < 0 BEGIN ALTER DATABASE InMem_HashCollision SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE InMem_HashCollision; END
Reference: Pinal Dave (https://blog.sqlauthority.com)
1 Comment. Leave new
It appears as if the cleanup stage requires a greater than zero comparison:
IF DATABASEPROPERTYEX(‘InMem_HashCollision’, ‘Version’) 0
In SQL Server 2017 (RTM-CU15), this test returns 869 for every database tested:
select DATABASEPROPERTYEX(‘DatabaseName’, ‘Version’);
In SQL Server 2016 (SP1-CU10), this command returns 852 for every database I tested, and in SQL Server 2012 (SP4), this command returns 706 for every database I tested.