SQL SERVER – GUID vs INT – Your Opinion

I think the title is clear what I am going to write in your post.

This is age old problem and I want to compile the list stating advantages and disadvantages of using GUID and INT as a Primary Key or Clustered Index or Both (the usual case).

Let me start a list by suggesting one advantage and one disadvantage in each case.

INT

Advantage:

  1. Numeric values (and specifically integers) are better for performance when used in joins, indexes and conditions.
  2. Numeric values are easier to understand for application users if they are displayed.

Disadvantage:

  1. If your table is large, it is quite possible it will run out of it and after some numeric value there will be no additional identity to use.

GUID

Advantage:

  1. Unique across the server.

Disadvantage:

  1. String values are not as optimal as integer values for performance when used in joins, indexes and conditions.
  2. More storage space is required than INT.

Please note that I am looking to create list of all the generic comparisons. There can be special cases where the stated information is incorrect, feel free to comment on the same.

Please leave your opinion and advice in comment section. I will combine a final list and update this blog after a week. By listing your name in post, I will also give due credit.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Constraint and Keys, SQL Data Storage
Previous Post
SQLAuthority News – Public Training Classes In Hyderabad 12-14 May – SQL and 10-11 May SharePoint
Next Post
SQL SERVER – Disable Clustered Index and Data Insert

Related Posts

106 Comments. Leave new

  • convert GUID to int:

    Sure you can, but it’s better with long to get the most value because a guid uses 16 bytes:

    byte[] gb = Guid.NewGuid().ToByteArray();

    int i = BitConverter.ToInt32(gb,0);

    long l = BitConverter.ToInt64(gb,0);

    .

    Reply
  • My Thoughts – GUID has its place. I prefer it used for Authentication of User only. Outside of this I like to have the DB provide INT / BIGINT sequential ID’s for each new entry. I work with web apps that theoretically will have lots of users and lots of look ups over time. Indexing on the transactions is very important. As for the users, they need to be authenticated to do anything and then again authorized at each level. GUID gives flexibility here for he user, but I fear using it outside of the “User Authentication” table. The nice thing about Integer based types in the database is that as they are created the numbers can increase by one. This means that if I have one million rows of transactions I do not need to return them all. Chances are the lower numbers are old and outdated. This also helps with faster indexing and Joins.

    Authentication – Consider GUID
    Other Tables – Lean toward Integer (TINYINT / SMALLINT / INT / BIGINT as needed).

    Reply
  • Is re-indexing a problem anymore? It would seem to me that INT would have a disadvantage should an event ever occur that caused re-indexing to be necessary.

    Reply
  • Which is the best option between INT and UniqueID?

    Reply
  • Using integer ranges instead of GUIDs:

    Note this only works if you have control over every SQL Server to install ranges!!

    An alternative to GUIDs is to use int/bigint with ranges on each server in a distrobuted environment. A word of warning here though, identities do not work, they always take max(id)+1, you have to write a sequence solution or use sequences in SQL Server 2012. I tried using identities and the first time you sych the data between 2 servers the identies start to clash, or you range constraints raise errors.

    The solution I used on SQL Server 2008 was as follows:

    1. create a table to hold your sequences {name, currentValue, MaxValue, Increment}

    For each PK configure a named sequence (note current value should be initialised to the min of the range or if data exists the table max +1 within the range) The name of the sequence is just the pk name, I found that the easiest way and most meaningful.

    I then create a stored procedure with the following signature

    GetNextSequenceValue @sequenceName varchar(128), @value int OUT

    In the SP I simply do a lock for update select to get the currect value of the named sequence + 1, check if it is in range, update the sequnce table and set the out value.

    I also created a scalar function

    GetCurrentSequenceValue( @sequenceName varchar(128))

    This simply gives you the current value of a named sequence

    I then had to set the default value of the int PK columns in the table to 0 as we will never use a value of 0, it is just a flag to the trigger to get get the next sequence value.

    Then I added an instead of insert trigger to the table, where if the PK value is 0 I get the next sequence value using the SP and then do the real insert. This means that if the PK is not defined in the initial insert, the sequence value is used, if it is defined, the defined value is used, this allows sync to write the PK ids.

    Note: no range constraints can be used on the tables, this will cause sync from other ranges to fail, the SP must control the range using the definition from the sequence table.

    This solves all issues with guids and integers, except when you have no control over the sql servers are in your system, and configuring the ranges is not posible.

    I suggest ranges as follows:

    Central server: -2147483647 to -1 (use all negative values gives you the same range as a positive integer)

    Client servers:
    First Client: 1 to 99999999
    Second Client 100000001 to 199999999

    Last client 9900000001 to 2147483647

    You can always use negatives on clients if the central server makes more sense to have positive IDs.

    Reply
  • Well a GUID can give me a strong unique id in the world/cosmos for many many years to come. Statistically, it would take trillions of years to get a duplicate if you would generate 1000s per second.

    I should be able to use this uniqueness in many apps.

    Reply
  • How can I guarantee uniqueness across the server without using uniqueIdentifier?

    Reply
  • Sowmya Murali
    May 6, 2013 12:05 pm

    Kindly let me know what would be the recommended option for primary key, given the following scenario.
    1. Table contains a UUID column and a ID column which is a decimal
    (28,0)
    2. UUID generated is stored in 8 bytes using binary columns
    3. Not a distributed environment
    4. No of rows will be maximum tens of thousands

    Is it true that modern recommendations for table design suggest usage of UUID instead of sequences for security reasons?

    Reply
  • ashish nasre
    March 7, 2014 7:19 pm

    HI friends,

    as we learned we can use unique identifier column same as identity column in sql server table and in that case the values will be globally unique i,e across tables,databases,servers,computers.
    but here i am little bit confused.
    i am agree the values will be globally unique across tables,databases since both belong to same machine but how it will be unique across servers and computers.how it interprets the unique values across different servers and different computers if both are not know each other i mean not connected each other.

    Please suggest me on the same..

    Thank you.

    Reply
    • Ashish,

      The algorithm for generating GUIDs usually takes details of the computer in order to make GUIDs which won’t clash with other computers, such as the ethernet id which is supposed to be unique to each ethernet card. There is still a tiny chance that two machines could make the same GUID but the odds are so small it’s hard to comprehend. You’d need to generate 1 billion GUIDs per second for 100 years before you’d have a 50% chance of a single collision.

      Reply
    • Actually there is no guarantee that two GUIDs generated on different computers will not collide (be the same).
      It’s just so goddamn unlikely, that it will propably not happen in your lifetime:
      “GUIDs generated from random numbers sometimes contain 6 fixed bits saying they are random and 122 random bits; the total number of unique such GUIDs is 2122 or 5.3×1036. This number is so large that the probability of the same number being generated randomly twice is negligible;[…]Assuming uniform probability for simplicity, the probability of one duplicate would be about 50% if every person on earth owned 600 million GUIDs.”
      128-bit integers have a really big amount of numbers (over 4 billions squared).

      Reply
  • HI Friends,

    Many Thanks for valuable time as well as suggestion.

    But I would like to say something on the same,

    As Adam Said about Ethernet ID of machine, can we say it physical address of machine?
    As we all know every machine have its own physical address and it is unique for every machine so whenever we will be trying to create GUID’S on machine it will be different or unique across tables, machines, servers……………………..

    Because for creating GUID’s values, SQL server Engine will be taking reference of physical address of machine.

    We can see physical address by CMD prompt by command

    C:\ipconfig /all

    Also one more thing I would like to say.
    Every computer have MAC(media access control) address ,it is a number of 12 characters(present in 6 pair of two characters each) used to identify NIC(network interface card) installed on machine.
    I would like to say the GUIDs would be unique across machines or servers only when , NIC card and there installation is on source machines. otherwise we can’t say the GUIDs will be unique.
    And the MAC address identify as physical address of machine.

    Reply
    • The MAC adress, aka physical adress, aka Ethernet Adress should be unique between different cards. Note that I said should, not is.
      Every manufacturer get’s thier own “Mac Ranges” and should ask for new ranges when it exhausted one. That did not prevent some very stupid executive from saying “Gee, why not use the same MAC adress for cards sold in the Asia, US and Europ markets. They’ll never mix anyway.”
      Plus with virtual machines and virtual network devices you can actually choose a MAC adress to be used. Or rather you have to generate a different, random one for each device.

      MAC adress is not a reliable way of identifiying hardware or a computer and never was supposed too. At tops it can reliably identify a Network Device in any physical/virtual network.

      Reply
      • Hi Friends,

        Thanks for the clarification but I am little bit confuse here.
        How we can say that the production company can produce 2 NIC(network interface card) with same MAC number.
        if they can then machine will refuse the connection because of collision whenever we will be trying to connect both in same network.

        It should be unique across the world and it is.

        So I don’t think we have 2 machine with same MAC number.

        Please correct me if I am wrong.

      • “How we can say that the production company can produce 2 NIC(network interface card) with same MAC number.”
        The same way I can write the same number twice. 5. 5.

        Yes, they should not have. Indeed everybody with half a brain can figure that out. And afaik they stopped that by now from complaints.
        And they were at least smart enough to send NIC with same MAC adresses into different markets.
        Furthermore, most NIC are not even conencted to the same network. The original plan was that every computer has a public IP adress (no routers) and IPv6 goes into that direction again. But right now most NIC are connected to a router. The space in wich MAC adresses can collide ends with a router. Wich MAC adresses computer in the internet have is irrelevant for the local network and vice versa. (Same goes for local wired network, local wireless network and internet – all three distinct layer 2 Networks).

        But the MAC adress can still be changed, overridden, and could be duplicated. It is not a reliable way to identify a computer. At least not past a local network.

  • Numeric sequence based solutions like identity or SQL Sequences are my choice for finite problems that can be solved by finite numbers of machines. Even when you need multiple application or database severs you can work around the complexity pretty easily by assigning each server a unique identifier which becomes a part of the PK (Primary Key).

    For non-finite problems such as those that use HADOOP, I recommend using a GUID from the time the data is captured until it reaches a central repository where a sequence can be used for further processing. While the data is being handled by an unpredictable number of application and database servers the GUID solution works beautifully. Once the data reaches a common resting place it becomes practical to assign the sequence to make additional processing more efficient .

    Reply
  • Divyesh Chapaneri
    February 27, 2019 3:54 am

    Hi,

    Can you show an example of converting guid into bigint and reverse bigint into original guid in SQL ?
    I tried with below SQL but not working for me. The first half of the reversed guid is correct but it has all 0 for the remaining ones.

    DECLARE @originalGUID uniqueidentifier = newid()

    DECLARE @originalBINARY varbinary(8) = cast(@originalGUID as varbinary(8))

    DECLARE @originalBIGINT bigint = convert(bigint, @originalBINARY, 1)

    DECLARE @convertedBINARY varbinary(8) = convert(varbinary(8), @originalBIGINT, 1)

    DECLARE @convertedGUID uniqueidentifier = cast(@convertedBINARY as uniqueidentifier)

    SELECT ORIGINAL = @originalGUID, IntoBinary = @originalBINARY, IntoBigInteger = @originalBIGINT, ReverseIntoBinary = @convertedBINARY, ReverseIntoGuid = @convertedGUID

    Thanks
    Divyesh

    Reply
  • Just to add some new information that is actually very old information… The fragmentation that almost everyone in the world associates with the use of Random GUIDs is mostly a myth caused by misinformation, bad testing, and a decades old “Best Practice” this was never meant to be a best practice and is, in fact, actually a WORST PRACTICE.

    Please see the following ‘tube for proof of that.

    Also, to summarize some of the other things I’ve seen on this thread including the original post….

    1. Done correctly, GUIDs are not a string. They’re just displayed that way…

    2. GUIDs are 16 bytes. The UNIQUEIDENTIFIER datatype is how they’re normally stored in SQL Server/T-SQL and many other relational databases.

    3. Considering that GUIDs take 36 bytes to store as a CHAR datatype and 72 bytes to store as an NCHAR (and add 2 bytes for VARCHAR and NVARCHAR for the “length” indication), it is seriously unwise to store them in their “display” form.

    4. At 16 bytes when correctly stored as UINIQUEIDENTIFIER datatype, GUIDs are 2 times larger than BIGINT and and 4 times larger than INT. You do have to consider that but, if you need the features of a GUID, there’s not much else you can use.

    5. NEWSEQUENTIALID() is similar to the old Type 1 GUIDs in that they contain one of the MAC addresses associated with the machine they were generated from. One of the big reasons MS moved from Type 1 GUIDs to the totally random Type 4 GUIDs is because a lot of people correctly insisted that the MAC address component created a possible attack vector concerning security. That means that NEWSEQUENTIALID() poses the same issue although the presence of the MAX address does make it even more “Globally Unique”. Multiple VMs on the same box can reduce that uniqueness quite a bit.

    6. NEWSEQUENTIALID() is NOT the way to fix the supposed fragmentation issue for Random GUIDs. In fact, except for being even less likely to produce a dupe, it brings all of the problems (yes, I said “problems”) that ever-increasing index keys bring to the table including the infamous “Hot Spot”, which can cause serious slow downs and statistics problems.

    *** AND NOW FOR ONE OF THE MOST IMPORTANT POINTS CONCERNING RANDOM GUIDS. ***

    7. The Random nature of Random GUIDs is NOT the primary reason for the myth of fragmentation that has blamed on the randomness for decades. The real problem is in how people have been maintaining Random GUID indexes. Random GUIDs can actually be used to PREVENT FRAGMENTATION AND TOTALLY ELIMINATE ANY AND ALL “HOT-SPOTS”. Except for their size, RANDOM GUIDS BEHAVE AS THE EPITOME OF HOW MOST PEOPLE EXPECT AN INDEX TO BEHAVE. It would take way to much to prove here so, instead, please see the following ‘tube. There are a couple of “sudden” advertisements that have appeared in ‘tube that may seem a bit annoying but, without the sponsors, such events simply would not be possible.

    Here’s the link…

    Reply

Leave a Reply