SQL SERVER – GUID vs INT – Your Opinion

April 28, 2010

I think the title is clear what I am going to write in your post.

This is age old problem and I want to compile the list stating advantages and disadvantages of using GUID and INT as a Primary Key or Clustered Index or Both (the usual case).

Let me start a list by suggesting one advantage and one disadvantage in each case.

INT

Advantage:

Numeric values (and specifically integers) are better for performance when used in joins, indexes and conditions.
Numeric values are easier to understand for application users if they are displayed.

Disadvantage:

If your table is large, it is quite possible it will run out of it and after some numeric value there will be no additional identity to use.

GUID

Advantage:

Unique across the server.

Disadvantage:

String values are not as optimal as integer values for performance when used in joins, indexes and conditions.
More storage space is required than INT.

Please note that I am looking to create list of all the generic comparisons. There can be special cases where the stated information is incorrect, feel free to comment on the same.

Please leave your opinion and advice in comment section. I will combine a final list and update this blog after a week. By listing your name in post, I will also give due credit.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Constraint and Keys, SQL Data Storage

SQLAuthority News – Public Training Classes In Hyderabad 12-14 May – SQL and 10-11 May SharePoint

SQL SERVER – Disable Clustered Index and Data Insert

106 Comments. Leave new

James Hancock
May 4, 2010 11:24 pm
Jeff:
From your own link: A GUID is a 128-bit integer (16 bytes) that can be used across all computers and networks wherever a unique identifier is required. Such an identifier has a very low probability of being duplicated.
notice across all computers and networks.
Very low probability means that you have a better chance of being hit by lightening 1000 feet under water than having a duplicate.
As for needing a check. You don’t need to check. You need to handle a primary key constraint error and regen a new one when you have that error fired at you, and then save it again.
Since this will happen about once in the next 1000 years this isn’t a high cost and you don’t have to round trip.
Same is true for synchronization. Which is why MS uses them in their own replication. And by making them the primary key, it makes it even easier to fail fast instead of worrying about dups.
Reply
Jeff Moden
May 5, 2010 6:54 am
James:
I also said that the probablity was remote. The problem is, it’s still a possibility and with that, the TYPE 4 GUIDs are not what they say they are. They are NOT globally unique and regardless of how remote it may be, duplicates can occur.
Yes, I agree that if everyone in your enterprise dumps their data to the same table, then a PK or UNIQUE index will catch it. However, there are a lot of folks who use GUIDs as the PK for many or all of their tables and duplication could occur with the dupes in separate tables where there’s no chance of a PK or UNIQUE index checking on it. If your systems are built under the assumption that every GUID is guaranteed to be unique across all tables and all servers for your enterprise across the world and you “hit the lottery” with one of those extremely rare dupes and you don’t know that GUIDs can be duplicated, I’m thinking that it’ll take you more than a couple of minutes trying to figure out what the heck happened.
The listed advantage of GUIDs (as they apply to SQL Server) being unique across the world is false.
Reply
Randy Pitkin
May 6, 2010 3:32 am
GUIDS are not strings and JOIN at the exact same performance level as INTS on anything better than an 1970 8088 chipset.
Jeff Moden declared that GUIDS are not unique, Well until someone has identified a collision where the GUIDS where system derived and not manually set, I would disagree. There are 6 x 10 ^ 28th Stars in the sky.
There 1 x 10 ^ 128th possbile GUIDS
Identifying a Column as a Primary Key adds the unique Constraint. I have collapsed 10 Terabytes of disc space and failed to create a duplicate GUID.
Reply
Eric Russell
May 20, 2010 6:00 am
Maybe a GUID non-clustered primary key in a distributed environment, but I’d need a convincing case made. Really only a natural key should be the PK.
As for clustering on a GUID, that is totally wrong. The disadventages are: page splits, bloated rowid, fragmented indexes, and poorly performing bookmark lookups.
The advantages would be what?
Reply
Jeff Moden
May 26, 2010 2:55 am
Randy Pitkin wrote:
>>Jeff Moden declared that GUIDS are not unique, Well until someone has identified a collision where the GUIDS where system derived and not manually set, I would disagree. There are 6 x 10 ^ 28th Stars in the sky.
There 1 x 10 ^ 128th possbile GUIDS
True enough. To make it even more awsome sounding, if a mile were just 1 GUID, then every possible GUID lined up in a row would be wider than 17 Quadrillion Milky Way Galaxies. Further, since the GUIDs on a single machine are pseudo random (sequence implied), that makes them unlikely to ever repeat until they’ve all been used.
However, the fact that they are pseudo random also means that, in theory, two machines could generate the same GUID. It’s good that you use UNIQUE indexes on your GUIDs ;-) especially if you use sequential GUIDs.
The bottom line is that Microsoft has stated that the GUIDs aren’t unique and that they could repeat and you shouldn’t assume a “guarantee” of uniqueness. I’ll see if I can find the URL where I saw that later on tonight.
Shifting gears, I agree with what Eric Russell stated above. Unless you happen to have an affinity for page splits, you should never use a GUID as a clustered index.
Reply
Amit Banerjee
May 29, 2010 1:38 am
I would say it depends on the solution that works best for your environment. I have seen both INTs and GUIDs working and not working in production environments. So it would be unfair for either camp to say which is the best of the lot.
I just posted my thoughts on this topic here:
Reply
Riju
May 29, 2010 1:26 pm
I have created a table with 2 columns (i.e col1 ,col2)
and inserted 2 rows with null values in second column (i.e Col2)
after that i fired below qry
select disinct col2 from
as a result above query returned only single row instead of returning 2 rows,how can it be possible as the two nulls will not be same.
—–Sql Scripts ——————
create table #tmd(col1 int,col2 int)
insert into #tmd(col1)values(1)
insert into #tmd(col1)values(2)
select distinct col2 from #tmd
Reply
Erichero
June 4, 2010 12:32 pm
I’m interested to know if DB’s like SQL Server have been optimised for indexing and joins on GUID’s as opposed to other alphanumeric ID’s? In other words, does GUID processing behave like any other alphanumerics, or are they a special case? And if so, is this only the case for SQL Server?
Reply
- Adam
  June 8, 2010 3:23 pm
  GUIDs aren’t stored as alphanumeric values, they’re treated internally as 128-bit integers.
  Reply
Lucero
June 8, 2010 5:37 pm
@Randy Pitkin: your numbers are wrong. Being 128 bits in size, it’s a maximum of 1*2^128, not 1*10^128. Then, since the GUID version requires some bits (6 I think) to be static, this reduces the range to 2^122, which is still a fair number of possibilities with over 5*10^36 different GUIDs.
Reply
Natasha Abbadi
June 30, 2010 2:21 pm
I would say, the best candidate for a clustered index is the INT and the worst is the GUID;
First because of sorting, it is much easier to sort and INT than a string
Second indexes are stored in a B Tree structure, which makes the size of the index not only affect the storage, but also affect the length of the B Tree and accordingly affect the performance (the shorter your tree is the faster you reach your node).
It is extremely dangerous to have a bad performance clustered index, a bad index is worst than no index , because the cost of maintaining a bad index , and the cost of its “updates” and “inserts” will certainly overlook the little or no performance gain on the “select” and because all the non clustered indexes will be build using the clustered index that means they will have bad performance as well.
However if I needed to have the GUID as the primary key, I would do that but I would create another unique key for the clustered index (the default in MSSQL is to have the clustered index as the primary key but it is not necessary) , why ? because eventually most of the search on my table will not be by ID but rather by another logical (business) Key , meaning I will depend on non clustered indexes to gain high search performance , and my non clustered indexes will be attached to the clustered index , which needs to be small, unique and incremental to gain the best performance.
Reply
- Marko Parkkola
  June 30, 2010 5:06 pm
  GUID is not a string…
  Reply
- oakcool
  June 30, 2010 8:56 pm
  @Natasha, I agree that a B tree will be slower as it get bigger, but if you 10 item in it, it does not matter if the time of the item is a GUID, or an int, it still 10 items. so the fact that a int is smaller the a GUID does not help the case when it comes to B Tree.
  Reply
Baxter Basics
December 15, 2010 5:04 pm
It’s worth thinking about the sort of algorithm that creats Guids.
A baseline of static machine specific data is assembled at machine start-up: machine name, BIOS manufacturer and version, MAC address, CPUID, processor string, mainboard manufacturer, windows version…. you name it.
At OS logon, append some more data such as user name, domain membership, account creation and last login dates.
At the moment of creating a Guid, append some extra time-specific information: The current date and time. The number of milliseconds elapsed since the last reboot. The number of keystrokes appended since the last reboot. That sort of thing.
Now take this entire data set and perform something like a 128-bit MD5 hashing algorithm upon it. Voila! you have created a Guid. The computational expense of all this is why it takes an order of magnitude longer to create a new Guid than a new integer (but we’re still microseconds on a modern PC).
The likelihood of duplicate Guid creation is far more dependent upon the hashing algorithm used. Different hashes have different probabilities of collision. MD5 is pretty good, but it can happen in theory. In the parlance, it is “computationally infeasible”.
Consequently: Guids are fine as a surrogate primary key ***provided you then remove the clustered index created by default on this key***. Being a primary key means that the DBMS will enforce uniqueness, so don’t lose sleep over the astronomically minute chance of duplicate key generation.
Of course, you should only be doing this in the situation where no suitable natural key exists, or maybe if a natural key is so wide (e.g. a big compound key) that it would occupy more than ~900 bytes and is thus unusable on the Sybase architecture. And if an int or a bigint is inappropriate (replication issues being one possibility).
Reply
Chuck Clifford
January 26, 2011 10:08 am
All relational database engines are optimized to work with indexes on columns whose value increments after insertions (such as with in integer) – this is knows as a ‘natural progression.’
Relation database engines are NOT optimized to work with indexes on columns whose value is assigned a random value upon insertion (such as with a GUID, or any other alphanumeric string) – this is known as an ‘unnatural progression’.
If you are implementing a solution that uses a relational database engine to store your data, whenever you are choosing a data type for either an index column or for a table’s primary key ALWAYS choose an integer (of the appropriate size).
Choosing a GUID or some other alphanumeric data type for an index column or for a primary key is now and has always been a rookie’s mistake.
Reply
- pdohara
  August 10, 2011 2:18 am
  a GUID is not alphanumeric. Your rookie is showing. :-)
  Reply
John Galt
January 26, 2011 7:48 pm
Chuck:
If you’d bothered to read the rest of the thread what you just wrote was already covered.
Yes, random GUIDs are bad. However sequential ones are not, and are only double the side of int64. Thus there is very little negative effect in using them. Either use newsequentialid or create your own function that does the same and you’re golden.
Reply
Swanand Pol
March 15, 2011 6:36 pm
Hi,
Recently, I was working on GUID. And GUID seems a big horror for me. That’s becuase I wasnt able to convert string value to GUID. Everytime stucked with the same error Conversion failed while converting GUID to string. So what’s the solution now?
Reply
pdohara
August 10, 2011 2:30 am
A couple of comments. First, a googling of the performance hit of sequencial GUID in a DB turns up a typical answer in the 5-10% range. In other works operations that take 1 unit of type with INT take approximately 1.05 to 1.1 unit of time with GUIDs. This is obvisouly significant, but not necessarily a show stopper. You should construct your own benchmark and find out what the effect is for your own data. The same is true for the additional storage. Yes consuming four time the storage for the idenitfier is worth noting. On a table with several (I think the number was 8) keys this is even more significant. On the other hand if the table has a few string columns (i.e. nvarchar(150)), then the storage of the index has become a fairly small percentage of the overall table size.
There is another advantage of GUIDs that has not been mentioned here. They solve the cross table refence bug. In this bug a peice of code gets an ID from table A and then erroneously uses it as a foriegn key in table B. The bug is that the foriegn key column in Table B references Table C. So let’s say that there is an invoice and an invoice detail table. There is also a customer table. It is possible to write code that incorrectly uses the customer id value in the invoice details table as a reference to the invoice that detail row should be associated with. Using sequenctial integers for ids this will almost always work. Using GUID this will almost always fail.
Reply
pdohara
August 10, 2011 2:32 am
BTW, Please comment here when you create the new Post with all these thoughts consolidated. I found this quite informative, though somewhat laborious to read.
Reply
pdohara
August 10, 2011 4:14 am
I am using this article to create my benchmarks:
You can view the English translation here:
I changed the GUID table to use NEWSEQUENCIALID rather than NEWID.
I am seeing similar difference in the insertions (note that Sprecher is a little faster than the machine they used. )
INT 12ms
GUID 23ms
Still the ratio is about 2 to 1 on many sequential inserts.
The clustering is less problematic by far:
table_name index_id index_type_desc index_depth avg_fragmentation_in_percent page_count
TestGuid 1 CLUSTERED INDEX 2 14.8148148148148 27
TestGuid 2 NONCLUSTERED INDEX 2 33.3333333333333 9
TestInt 1 CLUSTERED INDEX 2 9.52380952380952 21
TestInt 2 NONCLUSTERED INDEX 2 33.3333333333333 3
The 300% increase is troubling, but not nearly so troubling as the 1000% increase of the original article.
Similarly the space usage is not as dire:
index_name type_desc space_used_in_kb
PK__TestGuid__3213E83F59063A47 CLUSTERED 216.0
PK__TestInt__3213E83F5DCAEF64 CLUSTERED 168.0
uix__TestInt__fk NONCLUSTERED 24.0
uix__TestGuid__fk NONCLUSTERED 72.0
While he was getting 2 to 1 cluster and 4 to 1 not clustered, my numbers are more like plus 30% and 3 to 1. So space is an issue, but performance is likely not.
In addition I created a set of single row selects from both tables. The set included every row value (I created 2048 rows in both tables). The results are:
INT 29ms
GUID 52ms
As we see the performance is not quite 2 to one, but remember that this represents 2048 individual selects. The performance difference per select would be difficult to measure.
The result is that though there is a difference it is not particularly dire for my use case. Perhaps for yours it is. As with most things it depends on what you plan to do with it.
Pat O
Reply
Farhang Amary
October 11, 2011 7:56 pm
convert GUID to int:
Sure you can, but it’s better with long to get the most value because a guid uses 16 bytes:
byte[] gb = Guid.NewGuid().ToByteArray();
int i = BitConverter.ToInt32(gb,0);
long l = BitConverter.ToInt64(gb,0);
.
Reply
ZackyJ
February 9, 2012 4:54 pm
My Thoughts – GUID has its place. I prefer it used for Authentication of User only. Outside of this I like to have the DB provide INT / BIGINT sequential ID’s for each new entry. I work with web apps that theoretically will have lots of users and lots of look ups over time. Indexing on the transactions is very important. As for the users, they need to be authenticated to do anything and then again authorized at each level. GUID gives flexibility here for he user, but I fear using it outside of the “User Authentication” table. The nice thing about Integer based types in the database is that as they are created the numbers can increase by one. This means that if I have one million rows of transactions I do not need to return them all. Chances are the lower numbers are old and outdated. This also helps with faster indexing and Joins.
Authentication – Consider GUID
Other Tables – Lean toward Integer (TINYINT / SMALLINT / INT / BIGINT as needed).
Reply
nobody
May 22, 2012 3:20 am
Is re-indexing a problem anymore? It would seem to me that INT would have a disadvantage should an event ever occur that caused re-indexing to be necessary.
Reply