SQL SERVER – GUID vs INT – Your Opinion

I think the title is clear what I am going to write in your post.

This is age old problem and I want to compile the list stating advantages and disadvantages of using GUID and INT as a Primary Key or Clustered Index or Both (the usual case).

Let me start a list by suggesting one advantage and one disadvantage in each case.

INT

Advantage:

  1. Numeric values (and specifically integers) are better for performance when used in joins, indexes and conditions.
  2. Numeric values are easier to understand for application users if they are displayed.

Disadvantage:

  1. If your table is large, it is quite possible it will run out of it and after some numeric value there will be no additional identity to use.

GUID

Advantage:

  1. Unique across the server.

Disadvantage:

  1. String values are not as optimal as integer values for performance when used in joins, indexes and conditions.
  2. More storage space is required than INT.

Please note that I am looking to create list of all the generic comparisons. There can be special cases where the stated information is incorrect, feel free to comment on the same.

Please leave your opinion and advice in comment section. I will combine a final list and update this blog after a week. By listing your name in post, I will also give due credit.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Constraint and Keys, SQL Data Storage
Previous Post
SQLAuthority News – Public Training Classes In Hyderabad 12-14 May – SQL and 10-11 May SharePoint
Next Post
SQL SERVER – Disable Clustered Index and Data Insert

Related Posts

106 Comments. Leave new

  • GUIDs have a sordid past, and I think every DBA approaches GUIDs with trepidation. I have only seen one situation where a GUID was appropriate – a large data warehouse where data had to be stored on multiple tables and where ‘absolute’ uniqueness was a requirement. Almost every time a developer or data modeler approaches me with a table using GUIDs a few minutes of conversation results in use of INT or BIGINT.

    The uniquness of GUID’s seems to be a popular topic. It seems very difficult (and nearly statistically impossible) to duplicate a GUID unless you are trying very hard to do so. Since there are no large scale studies proving that GUIDs are bullet-proof, I agree with Denis the Theif would never rely on them in a Bank-like setting with ridiculously high transaction rates.

    One serious disadvantages of GUIDs comes from privacy issues from V1 GUIDs (where the MAC address was used as a base in creating the GUID). Another is that current-generation GUIDs can be somewhat predicted which makes them unsuitable for security uses and the like.

    Reply
  • As far as I remeber CPUs can utilize paired registers, so now we have 64-bit CPUs which can handle 128bit values easily, without performance hits. RAM is cheap as well as disk space.
    We are going to interconnect many databases and analyze records from different sources, replication and dispersed databases are more and more popular.
    So, why should we reflect if the PK contains 64 or 128 bits? How many extra columns your tables have, which occupy more unused space than those extra 64 bits?

    From my point of view, the only important factor is the data processing speed. This is something we must consider as our databases grow, grow and grow and will never stop to do so.

    Have anyone compared GUID:INT indexes & joins efficiency?
    Does DB engines checks for duplicates in GUID PK rows?

    Reply
    • Comparing values that are larger than a CPU’s register is always tricky and has to be performed inside full fences as it is a non-atomic operation.
      Assume a 64bit variable on a 32bit system. Comparing it is a 2 step process that can be suspended in-between during regular thread preemption. So thread A is comparing 64bit variable to some other value and is suspended in the middle ater the first half (as only 32 bits fit into CPU’s register at once) while some other thread that becomes alive after the first one gets asleep updates the same variable’s value resulting in inconsistent comparison. This must not happen so a full fence is deployed which introduces performance hit somewhere around 10 – 40ns on 2010-era desktop. The same happens with GUIDs unless 128bit computers and servers become standard.

      Reply
  • You don’t have to use one or the other exclusively as some people seem to think.

    It is ok to use your PK as an INT and then put a GUID for accountGuid to use in data warehouses. The important thing is to optimize for the use you have in your local database. If you need to get that data across a warehouse then you use the accountGuid field. And you don’t fragment your PK index by using a GUID.

    Of course there is also the sequential GUID that SQL Server uses for Sync Framework if you want to ensure they are always in order, but that just seems like an even worse idea to me.

    Reply
  • James Hancock
    April 29, 2010 7:37 am

    GUIDs are invaluable in several instances:

    1. Replication. You have a choice. You can either just use it as your PK or you can have PK AND a GUID and waste even more space. Might as well just get it over with.
    2. Multi-linking. I.e. ItemID which is referencing multiple tables such as in accounting. Yes, you can do it with ints, but good luck with that one. I’ve seen the results of ints in these cases, and man what a disaster. And Replication is even worse.
    3. Datawarehousing as stated before. Really, is anyone not concerned about the day that they need sharding?
    4. Sync from multiple sources: Imagine if Outlook stored a GUID that we could use and then query for all changes since the last record? Or IMAP? Or ActiveSync? Or any other source where you need to get last changes on records by ID. GUID is the only way to go, otherwise you end up with the mess that is Outlook’s communication with Exchange server.

    The downsides are four:

    1. Take up more room. But as Harry said, if you’re worried about space, then bigint is your way to go. That’s 64 bits, instead of 128 for a GUID. Double the space for an guid, who cares when drives are so large? None of our clients are even over the 20 gig mark and that’s with email attachments and documents as filestreams included in a backup. IT’S NOT A BIG DEAL. (i.e. I just did a filestream conversion for a client with SQL Server 2008 R2 and he had over 50 million emails in his database for his company. No one ever deletes any email. They have it set in security not to allow it. That’s 10 years of emails in our system, and it still flies for every one of them on the network, with replication and automatic backup happening. The indexes total < 1 gig total in their database, compared to 19 gigs of real data. I'll take that any day.
    2. Join speed: I don't see this one at all. We tried it just to see, and we're talking ms on hundreds of thousands of records… at least with SQL Server 2008.
    3. Clustered Keys killing speed: This has a solution. Use a sequential ID. SQL Server 2008 supports it natively if badly, and you can write your own routine that does even better.
    4. Possible collision: This is made worse when using a sequential guid because you're taking away part of the randomness. But we're still talking, when using a robust routine based on date/time of having to insert more than 100,000 million rows in a second to have a statistically plausible chance of a collision. And really, if your database engine doesn't have the ability to handle these errors, then you're doing something wrong. We wrote a wrapper for all of our calls to handle for this possibility. It's never been called, because it would tell us about it, but it's there, and no database worth its salt will allow you to enter duplicate PKs so you don't have to worry about corruption, just error which is easy to handle.

    Being hard to read… come on! If you're that bad with SQL that this is an issue, you need to be in another line of work.

    twk: No it won't let you insert dupes if you have your PK set correct. And as I said, ms on hundreds of thousands of rows, and if you're querying that much at once, you're probably doing it wrong anyhow.

    Wil: GUIDs in SQL Server do not use the network mac only. You cannot predict them. MS paid a lot of money to test if there was any possibility of it, and no one has ever been able to create a prediction algorithm.

    There really is no excuse not to use GUIDs and in today's world of data interop, you'd be a really bad citizen not to expose a GUID for data sharing. So the question is do you want to use 192 bytes or 128? (Bigint + Guid or just Guid). I prefer 128, and I'll put the speed of our database design against anyone's.

    Reply
  • Marko Parkkola
    April 29, 2010 10:38 am

    Someone mentioned fragmentation. This brings a question: Is it possible to GUID to get fragmented in multiple pages?

    If it is then yes, there could be problem with fragmentation if the pages are spread all over the hard drive. Otherwise I don’t see it as a problem.

    MSDN “The SQL Server database page size is 8 KB.”

    Event with header fields, 16 bytes of GUID fits quite nicely there. And it’s not a big deal for hard drive and I/O to read 16 consecutive bytes from disk. It might even read those bytes to the disk buffer anyway.

    twk: Thank you for the info about CPU. It’s been a while since I’ve dealth with such low level stuff that I don’t recall it anymore.

    But to be pedantic, which I very much like to be … ;)

    Even if CPU can compare two registers at once, you have to load two values to registers instead of (with INT or BIGINT) just one. But it’s just one CPU cycle more and as James Hancock said, it’s not a big deal.

    Reply
  • Muhammad Abbas
    April 29, 2010 11:14 am

    We have used both together. Means table have int id as primary key but unique is for GUID. Now the question must be why both at the same time. So ans is that we use the GUID for integration between three systems. Means we publish GUID trough webservice while communicating with other system. As this is unique so we can use.

    I hope i have clear.

    Reply
  • James Hancock
    April 29, 2010 5:58 pm

    Fragmentation: Yes, unless you use sequentialid or create your own incremental implimetnation of GUIDs you will get fragmentation in your indexes badly because it’s random. Also clustering won’t work because its’ random so it can’t do an incremental lookup.

    But this is easy to fix, just not obvious because MS doesn’t really provide guidance on this.

    Reply
  • So far the biggest concerns i see for Guid are storage and performance. Does any one actually have the stats that can prove that Guid is performance intensive?

    I am an app developer and i find the Guid invaluable for my apps which keep changing continuously(including the DB) and keep getting updated/altered.

    Reply
  • James Hancock
    April 29, 2010 7:53 pm

    jitesh:

    1. Double the data of an int64 for the field. (and index)
    2. Performance as noted is within ms on even 100,000 records of an int64 from our testing.

    Reply
  • Guids:

    Pro:
    * allows asynchronous architectures more easily

    Reply
  • Guid has one great advantage. I can create a unique Id without making a call to the database. This comes in handy when I am inserting one parent with several children.

    Reply
  • James Hancock
    April 30, 2010 7:11 am

    “Guid has one great advantage. I can create a unique Id without making a call to the database. This comes in handy when I am inserting one parent with several children.”

    …And as long as you use the same function in your language to generate the sequential GUID, you are more than making up for any speed difference in the process…

    Reply
  • int/guid choice depend from data in database. For small/medium very distributed database GUID will be best choice. For large tables int/bigint is preferred.

    Unfortunately, big clustered PK included in all other indexes and 16-bytes record pointer will included in all our indexes on table if we will use GUID type as PK.

    Now, I use bigint type in distributed database with large ranges and don’t have any troubles with PK generation.

    Reply
  • Vic Berggren
    May 2, 2010 3:25 am

    I’ve got a product in my datacenter that makes use of guids… Sometimes multiple per row and they’re indexed which leaves us with unbearable performance issues it seems when loading new data.

    Reply
  • Wow! What a horrible misconception a lot of people have about GUID’s and, from what I’ve seen on this thread so far, only the poster known as “Dennis the Thief” picked up on it.

    For all of those relying on GUID’s to be UNIQUE… especially you poor buggers with extremely massive data warehouses… surprise, surprise, surprise…

    The GUID’s in SQL Server are “Type 4” GUIDs and you can verify this by looking at the 15th character of any SQL Server GUID you can put your hands on. You should certainly GOOGLE for what being a “Type 4” GUID means but let me save you a little trouble and repeat what Dennis stated in his post….

    Although it is highly unlikely that any given GUID will ever be repeated on a single server, GUIDS ARE **NOT**, I repeat, **NOT** GUARANTEED TO BE UNIQUE!!!

    GUID’s are nothing more than random numbers with a huge domain.

    If something in one of your tables has to be guaranteed to be absolutely unique, THEN DON’T USE GUIDs BECAUSE THEY ARE NOT UNIQUE.

    –Jeff Moden

    Reply
  • @Jeff of course you would use a unique constraint on the column. That is the only way guarantee uniqueness in SQL Server.

    Reply
  • James Hancock
    May 3, 2010 6:31 pm

    And they’re globally unique, not just to the computer. Sequential ids are only unique on that computer, but again, you shouldn’t be using this routine because you have to do a get to get the last inserted record as well on every call. You should be creating a function that calculates them the same way both server side (as a default) and as a function in your programming language so that you never need to do a second round trip.

    Reply
  • @James Hancock

    No James… they are not “globally unique”. The term “Globally Unique” is a misnomer and it is an error for Books Online to say “A GUID is a unique binary number”. Type 4 GUIDs are nothing more than random numbers with a huge domain and Microsoft currently uses TYPE 4 GUIDs.

    Please read the MSDN article at the following URL…

    https://docs.microsoft.com/en-us/dotnet/api/system.guid?redirectedfrom=MSDN&view=netframework-4.7.2

    … where it states in the REMARKS section, and I quote…

    “Such an identifier has a very low probability of being duplicated.”

    Microsoft uses “TYPE 4” Guids which no longer contain the MAC address of the computer like “TYPE 1” Guids did. To understand the differences between TYPE 1 and TYPE 4 Guids, please see the following URL…

    https://en.wikipedia.org/wiki/Universally_Unique_Identifier

    The bottom line is that GUID’s currently being used in SQL Server are NOT guaranteed to be globally unique. Although the likelyhood of duplication is very low because of the very large domain, Type 4 GUIDs (which is what Microsoft currently uses) aren’t even guaranteed to be unique on a single machine.

    Reply
    • “Such an identifier has a very low probability of being duplicated.”

      Do you have any idea how low the probability is?

      “equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.”

      https://en.wikipedia.org/wiki/Universally_Unique_Identifier

      You have a much greater chance of a stray sub-atomic particle flipping bits in your RAM. If you’re not programming around that potential problem, then for all intents and purposes GUIDs (even type 4) are unique.

      Reply
  • @Rob Boek,

    True enough but because TYPE 4 GUIDs are NOT guaranteed to be unique and DO stand a chance, no matter how remote it may be, of being duplicated, you run into the same problem as if you used an IDENTITY column… you need some central repository that all global entities in your enterprise have to check and see if the number has ever been duplicated in your system. They cannot operate asynchronously with a **guarantee** of unique TYPE 4 GUIDS… and Microsoft currently uses TYPE 4 GUIDs (at least in 2k5, they do).

    Reply
    • The chance is so mind-bogglingly tiny that there’s no point in being worried about it, you can treat them as unique. You would need to generate a billion GUIDs per second for 100 years to have a 50% chance of creating a duplicate.

      That’s no worse than the chance of a neutrino from space hitting your hard drive platter and causing an IDENTITY column to create a duplicate.

      Reply
  • @Brian Tkatch,

    Please see the links I posted two posts above. Books Online is incorrect.

    Reply

Leave a Reply