I think the title is clear what I am going to write in your post.
This is age old problem and I want to compile the list stating advantages and disadvantages of using GUID and INT as a Primary Key or Clustered Index or Both (the usual case).
Let me start a list by suggesting one advantage and one disadvantage in each case.
INT
Advantage:
- Numeric values (and specifically integers) are better for performance when used in joins, indexes and conditions.
- Numeric values are easier to understand for application users if they are displayed.
Disadvantage:
- If your table is large, it is quite possible it will run out of it and after some numeric value there will be no additional identity to use.
GUID
Advantage:
- Unique across the server.
Disadvantage:
- String values are not as optimal as integer values for performance when used in joins, indexes and conditions.
- More storage space is required than INT.
Please note that I am looking to create list of all the generic comparisons. There can be special cases where the stated information is incorrect, feel free to comment on the same.
Please leave your opinion and advice in comment section. I will combine a final list and update this blog after a week. By listing your name in post, I will also give due credit.
Reference: Pinal Dave (https://blog.sqlauthority.com)
106 Comments. Leave new
I like GUIDs, the fact that they can be shared across diferent instances makes it easy to move things back and forth with simple scripts.
Also I like that I can put just “details.asp?id=(guid)” without the fear of the user just iterating or trying different ids to get records they should not access… not a big security, but enought for most simple apps, more if WS are used from the JS, I feel a little safer with just no effort
I have a problem with this sentence about the GUID: “String values are not as optimal as integer values for performance when used in joins, indexes and conditions.”
My problem is: GUID’s are not stored or prosessed as string. The GUID or UUID is usually just a 128-bit Integer that is displayed (for human readability) as a Hexadecimal string. It’s identical to how IPv4 adresses are actually just 32 bit that are shown a decimal values with points (so please don’t store them as Varchar(15), use 1 INT or 4 tinyint’s) or IPv6 uses a 128-bit Integer. Or a single char/nchar 8 and 16 bit integers repsectively.
The only real difference between GUID and a normal 128-bit Integer, is how the next number is generated – randomly, rather than sequentially.
Also, regarding the Integers disadvantage:
If a 32 bit singed Integer is to small, you can easily use a 64-bit integer instead. If you have a 64-bit system, there won’t be any performance issues. It will still be faster than guid/uuid, as they only need 8 bit in on this disk (and thus when beign read or written).but it’s much harder to run out of numbers than with 32-bit (9,223,372,036,854,775,807 – when it is singed).
GUID
Disadvantages:
1) Clustered primary key with a sort will drag your table and database to its knees as more rows get entered. Since this is the default PK added by Enterprise Manager, this is dangerous.
2) Not easily translated to enums in programming languages. For types, and type tables.
Advantage: Easier integration with replication.
Personally, I hate the use of Guids, performance is a big one but by far it is storage. One of our applications has a table that has 14,000 rows but 8 columns are guids because of joins, so you can imagine it eats up a lot of space because of the relationships.
Guid’s take up a lot of space so I would avoid them where ever we can. If Int is too small, use bigint, still less space that Guid.
Harry Yeh
CEO / CTO
[2 links removed by admin to pass the comment test]
How much space does that table and its indexes take up? The table would take less than 2MB and each index less than 100KB.
Ooops, I meant to type less than 300 KB, not 100 KB.
Just to add one more point to the ‘advantages’ of GUID –
When using a GUID column a new id can be generated at the client side.
With INT, a client application typically will sand a ‘save’ request to the server, the server will save the record and send back the identity key generated by the insert. So the client application will know the new ID only after the record is saved.
With GUID, the client application can generate a new value and can send it to the server. It does not need to wait till the SAVE function returns to know what is the ID.
@Jacob I love that flexibility but the DBA hate it, because it breaks every single indexing, paging thing on the DB. The solution is to use Sequencial Guids, but then you loose the flexibility you describe.
Or just do two ids – a regular PK (int or guid) and a client guid.
My primary involvement with database programming is desktop application development, and in that regard, I vastly prefer int. It’s much easier to interact with programmatically. I typically write code on the frontend to determine IDs on INSERT, or use a stored procedure which determines and selects a usable identifier, so that’s not an issue for me personally.
@George, do you have a concurrent enviroment? If you gen the Id(INT) on the client you have to garantee that no one else will get to the database first, if someone does, you loose the number.
I’ve run out of int values several times. The next step up is a bigint, which at 8 bytes is still half the size of a guid.
I would almost never use a guid as the clustered index key for a few reasons. First, 16 bytes is a big key. All nonclustered indexes will be based on the clustered index key which translates to more store and less efficiency. The second reason is fragmentation. Because guids are random, they will get inserted throughout the table increasing page splits and fragmentation.
As Jacob said, guids are truly useful when you need to use them across systems, or generate them outside of the database. Even then, I was use the guid as a nonclustered key, and still cluster on an identity, or a datetime value.
@Rob, you can solve fragmentation by using sequencial GUIDs.
I think GUId as Primary Key will have performance issues. I feel it is not advisable to have GUId to be used as primary key because it will have overhead of both maintenance and performance while retrieving data.
In order to address the issue of running out of values when using INT or BIGINT data types, we can think of various options based on the nature of data that we are storing in the table. I am sure even if we have a composite Primary key, it will have less overhead than GUId.
Because guid’s are not consecutive, they can appear anywhere in the results. They are much harder to debug with, because a guid is much harder to enter into a query than an int.
I like the idea of starting with an int and then moving on to a bigint myself (if it runs out of numbers, that is, which rarely happens). There’s no real benefit of generating a guid on the client, as you need to insert the data at some point and you can return the ID at that point.
A guid primary key also gets added to every index that you create on that table, so every index becomes unnecessarily larger.
If you really need a guid in a row, make it a secondary key.
Actually, I hate to use the guid to be the primary key either. but as the scenario as you mentioned that the client generate the guide which can be used immediately in you client side. you don’t need to wait for the response from server what the ID is.
Is GUID really handled as string in joins and internally by the DB? That doesn’t make sense, at least to me.
GUID is 128 bit wide value. It is usually presented as string like “{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}” which looks like it’s a biiiiiig value but you could just as well store it with four INTs or with two BIGINTs if you like. You could even stored it as BINARY(16).
I am really amazed if Sql Server uses and stores them as strings.
But still, if Sql Server stores them as integer there comes some performance hit because the value is larger than what the CPU understands so the value can’t be compared on a single pass. Probably, I don’t know how memory comparison works on that level.
But anyway. I have two situations where GUIDs are used in a clever, but maybe not in so efficient way.
In one software GUID was used as user ID and it was split into pieces where first 4 bytes told in what country person worked, I don’t remember what the next 4 was, but last 8 bytes was a running number and each country/the-thing-I-don’t-recall domain.
Another software had on offline mode where user could create and modify records on the field without connection to main computer and sync changed back to central DB when in office. Naturally when you create new record you need a new ID (PK) for it. So they came up with an idea that every time the client was connected to the server it got a range of new keys which it could use offline. Keys were GUIDs where first 8 bytes told the user’s ID and last 8 bytes were running number reserved for the user alone. This way they never had to deal with duplicate keys.
@Marko Awesome man, I am actually gonna check about how it is stored in the DB, because it makes sense as you said. Take a look a the NonSQL with MongoDB, they use something like you are saying to distinguish the origin of information, if I am not mistaken
Check this out
https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms190215(v=sql.105)
Its 16-byte binary, if I understood it correctly
Thanks for the link!
So it’s 16 byte binary value (16 consecutive bytes of memory). In machine language everything is memory so there’s no difference if it’s BINARY(16) or two BIGINTs or even four INTs.
C-style pointer magic follows to demostrate that comparing 16 bytes of memory is fast. The following takes 3 to 5 CPU cycles if I calculated correctly.
// Here are the GUIDs
byte *guid1 = byte[16];
byte *guid2 = byte[16];
// The actual comparison operation
((*(int64*)guid1 == *(int64*)guid2) && (*(int64*)(guid1 + 8) == *(int64*)(guid2 + 8)));
I love pointers and pointer arithmetic even though I’m happy I don’t have to use them in C# anymore :)
Hi Pinal,
GUID will create more fragmentation and it also occupies more disk space than Int.
However it is useful in case it can be used domain wide, and newsequentialid() is used to generate guid.
So, personally I would not suggest using GUID as PK unless it
GUIDs performance is an issue when your db starts getting big.
INT has a few flavours to pick from so you can more selectively choose your data type based on it’s application. If I have a table which will hold reference data, let’s say AccountStatus – you might discover during the requirements gathering that there are only 5 or 6 statuses available, TINYINT would be sufficient for this ID field.
If you have a system liable to hold huge volumes of data you would probably assign BIGINT in the design phase, not INT.
Given the performance advantages in joining etc I would go with a variation on INT dependant on exactly what the requirements are for the table in question.
Many have already mentioned GUID will have space and performance issues when joined on the Key.
Another interesting point is that the clustered index on GUID will also affect all the Non Clustered Indexes on the table by unnecessarily making the length of the Key more than normal.
GUIDs are not strings, or not necesarly
Not all TABLEs need an Id. For the ones that do, decide if it needs uniqueness on the server or not. In most cases it isn;t needed.
Don’t let possible performance benefits change design before you know it actually is an issue.
Are you excluding BIGINT from this discussion? The range of an integer identity column can be doubled by seeding it at the maximum negative range. This gives BIGINT a range of 18,446,744,073,709,500,000 rows, large enough for most applications. A BIGINT takes 8 bytes of storage while a GUID takes 16 bytes. This provides you a 50% increase in memory devoted to indexes which in turn reduces IO necessary to support index lookups.
Negative values negate your second advantage because large negative numbers are quite impossible for us humans to deal with.
A corresponting range for INT is 4,294,967,294 rows stored in 4 bytes giving 75% more index memory space.
And for SMALLIT the range is 65,534 rows stored in 2 bytes and a whole lot more index values in memory.
We have never used GUID in our environment, used Int if needed. There were some table which had GUID’s it caused performance issues for us.
GUID
Advantages
– Consolidation – You have Customer Table in 5 different Databases and you want to make a Datawarehouse – no problem – the records can keep their keys. With INTs, you are going to need new keys which is going to get messy with your FKs
– Easier to Generate Multiples in Multi-User environment – If you need to insert multiples items at once allong with Child Records of those items, very easy. With INTs you may need to reserve the keys or need to lock do prevent 2 users attempting to use the same PK
Disadvantages
– Along with being harder for Users to understand if they are shown – Harder for DBAs to work with even when they are not shown to Users
– No gurantee of uniqueness (this one may be contraversial) – Although the chances of duplication are very very low, I wouldn’t feel comfortable if a Hospital or Bank was using them, would you???
Guids are guaranteed unique because they are generated from server specific characteristics such as MAC address, server time, disk serial number, randomization etc
As the MAC address is unique, no 2 servers will create the same GUID and as no 2 GUIDS can be created in the same clock tick, No 2 guids can be created on the same server. Even if you set the internal clock back, the added randomization, coupled with the clock tick, a disk id etc, the chances of creating the exact same id in the same clock tick is effectively impossible.
As for their format, they are nothing more than a collection of 8 & 16 byte ints, as if it were a 16 byte BigInt with groups of its bits populated from different sources.
INT:
Advantages:
– You can remember the numbers, are faster to type, faster to understad, faster for joins, clean code.
– Them give an idea of order of insertion that is important to get a fast idea if a record is new or old
– For a customer you can use the INT key as the customer code, with GUI mmmm not a good idea to tell a custer you are the customer {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx} :P
– Like others said you can use BIGINT that is big enough for all purpouse I can’t imagine an scenario that it dont work.
We hate long numbers like GUID and we never used them.
Che
* Rememeber
1.879.658.256
Ok can you remember this
*Order
1.879.658.256
1.879.558.257
Can you quickly access the order of this numbers? By the way should you use a time stamp/Date Time for that purpose
*Exposing to customers
you are customer 1.879.658.256, is that good for him? how about you customer Marcos(marcos@hotmail.com) isn’t that better and personal.
So does big int solve all your problems, guess not.
INT primary keys (especially identity seeds) being human readable are also a disadvantage as queries that should be date based, often use the identity seed based on the assumption that a lower ID was created earlier, which may not be true. Also, presenting a primary key as a reference code to a user, may have negative implications when master table references need to be reordered, which would require changing the primary key of the master and updating all tables where the key was a foreign key. It’s much better to have a key where a developer can not make this mistake and is required to add a reference column and a unique index to the table, that would only require the master table be updated, should such a scenario occur.
Joins are no easier using ints
As for insertion order, you can create sequential GUIDS
See above about why using the primary key as a Customer code is a terrible idea.
Here’s one real question you should ask when deciding on GUIDS: Will I want to transfer the data in this record, back and forth to other systems? If so, GUIDS are great because you can insert the row into the other system without changing the key or worrying about sending a BigInt to a system with INT as the PK. You don’t even need to see if the other system has a duplicate key when inserting because unlike identity seeds, GUIDs are unique across all tables and systems.
The time I would not consider a GUID is when I have local lookup tables that are never intended to be exported.
I believe you should use both:
Use a BIGINT as the primary key – clustered etc, for join’s, fast retrievals etc. i.e. this is optimised for retrieval and user convenience.
The have a GUID as your unique marker for the record, this is what you use when porting data across servers/applications etc. This would never be shown to users, but is stamped to all records.
This gives you the best of both worlds, you avoid the fragmentation and performance hit of using GUIDs as a primary key, but retain its benefit when your moving data / sharing it between systems since you can independently identify the record.
Some may argue that the extra storage cost is excessive, but I’d say spending a few dollars more on your SAN is cheaper in the long run than trying to do two different things with the same data column.
I’ve had several apps migrate/merge from local/regional usage to our worldwide enterprise level, GUIDs saved us.
Sure, if you have a few tables that are migrating, INT or BIGINT aren’t much to handle in the migration routines you have to write. But start merging huge customer and law enforcement databases – it’s best not to have to manage that and just use GUIDs from the outset.
Yes, thats the real reason you should use GUIDs, you may be able to handle with begint, by portioning the numbers, so branch 1 takes number from 1 to 2 billion, branch 2 goes from 2 billion and 1 to 4 billion, and so on. But then branch 1 run out of numbers, then what you do? Take the next available series, and now branch 1 has 1 to 2 billion and 6 to 8 billion, that works, but can get quite confusing and hard to manage.
If you that not real case, just remember the very way you got to this post… An IP address, have you ever though about it. 000.000.000.000 to 255.255.255.255 that a very large amount of number to pick from, but we are running out of them, and guess what IPv6 is a GUID, why?, one would wonder. Why?