SQL SERVER – Enumerations in Relational Database – Best Practice

SQL SERVER - Enumerations in Relational Database - Best Practice marko
Marko Parkkola

This article has been submitted by Marko Parkkola, Data systems designer at Saarionen Oy, Finland. Marko is excellent developer and always thinking at next level. You can read his earlier comment which created very interesting discussion here: SQL SERVER- IF EXISTS(Select null from table) vs IF EXISTS(Select 1 from table). I must express my special thanks to Marko for sending this best practice for Enumerations in Relational Database. He has really wrote excellent piece here and welcome comments here.

Enumerations in Relational Database

This is a subject which is very basic thing in relational databases but often not very well understood and sometimes badly implemented. There are of course many ways to do this but I concentrate only two cases, one which is “the right way” and one which is definitely wrong way.

The concept

Let’s say we have table Person in our database. Person has properties/fields like Firstname, Lastname, Birthday and so on. Then there’s a field that tells person’s marital status and let’s name it the same way; MaritalStatus.

Now MaritalStatus is an enumeration. In C# I would definitely make it an enumeration with values likes Single, InRelationship, Married, Divorced. Now here comes the problem, SQL doesn’t have enumerations.

The wrong way

This is, in my opinion, absolutely the wrong way to do this. It has one upside though; you’ll see the enumeration’s description instantly when you do simple SELECT query and you don’t have to deal with mysterious values. There’s plenty of downsides too and one would be database fragmentation.

Consider this (I’ve left all indexes and constraints out of the query on purpose).

CREATE TABLE [dbo].[Person] (
[Firstname] NVARCHAR(100),
[Lastname] NVARCHAR(100),
[Birthday] datetime,
[MaritalStatus] NVARCHAR(10)
)

You have nvarchar(20) field in the table that tells the marital status. Obvious problem with this is that what if you create a new value which doesn’t fit into 20 characters? You’ll have to come and alter the table. There are other problems also but I’ll leave those for the reader to think about.

The correct way

Here’s how I’ve done this in many projects. This model still has one problem but it can be alleviated in the application layer or with CHECK constraints if you like.

First I will create a namespace table which tells the name of the enumeration. I will add one row to it too. I’ll write all the indexes and constraints here too.

CREATE TABLE [CodeNamespace] (
[Id] INT IDENTITY(1, 1),
[Name] NVARCHAR(100) NOT NULL,
CONSTRAINT [PK_CodeNamespace] PRIMARY KEY ([Id]),
CONSTRAINT [IXQ_CodeNamespace_Name] UNIQUE NONCLUSTERED ([Name])
)
GO

INSERT INTO [CodeNamespace] SELECT 'MaritalStatus'
GO

Then I create a table that holds the actual values and which reference to namespace table in order to group the values under different namespaces. I’ll add couple of rows here too.

CREATE TABLE [CodeValue] (
[CodeNamespaceId] INT NOT NULL,
[Value] INT NOT NULL,
[Description] NVARCHAR(100) NOT NULL,
[OrderBy] INT,
CONSTRAINT [PK_CodeValue] PRIMARY KEY CLUSTERED ([CodeNamespaceId], [Value]),
CONSTRAINT [FK_CodeValue_CodeNamespace] FOREIGN KEY ([CodeNamespaceId]) REFERENCES [CodeNamespace] ([Id])
)
GO
-- 1 is the 'MaritalStatus' namespace
INSERT INTO [CodeValue] SELECT 1, 1, 'Single', 1
INSERT INTO [CodeValue] SELECT 1, 2, 'In relationship', 2
INSERT INTO [CodeValue] SELECT 1, 3, 'Married', 3
INSERT INTO [CodeValue] SELECT 1, 4, 'Divorced', 4
GO

Now there’s four columns in CodeValue table. CodeNamespaceId tells under which namespace values belongs to. Value tells the enumeration value which is used in Person table (I’ll show how this is done below). Description tells what the value means. You can use this, for example, column in UI’s combo box. OrderBy tells if the values needs to be ordered in some way when displayed in the UI.

And here’s the Person table again now with correct columns. I’ll add one row here to show how enumerations are to be used.

CREATE TABLE [dbo].[Person] (
[Firstname] NVARCHAR(100),
[Lastname] NVARCHAR(100),
[Birthday] datetime,
[MaritalStatus] INT
)
GO
INSERT INTO [Person] SELECT 'Marko', 'Parkkola', '1977-03-04', 3
GO

Now I said earlier that there is one problem with this. MaritalStatus column doesn’t have any database enforced relationship to the CodeValue table so you can enter any value you like into this field. I’ve solved this problem in the application layer by selecting all the values from the CodeValue table and put them into a combobox / dropdownlist (with Value field as value and Description as text) so the end user can’t enter any illegal values; and of course I’ll check the entered value in data access layer also.

I said in the “The wrong way” section that there is one benefit to it. In fact, you can have the same benefit here by using a simple view, which I schema bound so you can even index it if you like.

CREATE VIEW [dbo].[Person_v] WITH SCHEMABINDING
AS
SELECT
p.[Firstname], p.[Lastname], p.[BirthDay], c.[Description] MaritalStatus
FROM [dbo].[Person] p
JOIN [dbo].[CodeValue] c ON p.[MaritalStatus] = c.[Value] JOIN [dbo].[CodeNamespace] n ON n.[Id] = c.[CodeNamespaceId] AND n.[Name] = 'MaritalStatus'
GO
-- Select from View
SELECT *
FROM [dbo].[Person_v] GO

SQL SERVER - Enumerations in Relational Database - Best Practice enum1

This is excellent write up byMarko Parkkola. Do you have this kind of design setup at your organization? Let us know your opinion.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Best Practices, Database, DBA, Software Development, SQL Documentation
Previous Post
SQL SERVER – Fix : Error : 3117 : The log or differential backup cannot be restored because no files are ready to rollforward
Next Post
SQLAuthority News – We’re sorry… … but your computer or network may be sending automated queries. To protect our users, we can’t process your request right now.

Related Posts

44 Comments. Leave new

  • very nice, I hope I came up with idea

    I went for the per table for each namespace :(

    Anyway Thanks

    Reply
  • Beautiful!!. Thanks for sharing.

    Reply
  • Chris McCowan
    April 8, 2016 8:33 am

    If you implement it that way, then just make the value the primary key, not the namespaceid/value. If you require the namespaceid/value pairing for efficient queries on a specific namespace then this isn’t the right implementation.

    This type of table is great when you expect to have universal searches over multiple attributes and can also be a good way of keeping text searchable items segregated in a specialized table. If you don’t have specific requirements that make this useful then I don’t think its the right call though. It requires an extra query at insert time. Every join is to a larger table. It’s very easy to implement inefficiently. Your code will be harder to read and understand.

    Reply
  • OK, good point Wes and good reply Marko.

    However, I have a need to relate ‘Monday’ to 1, ‘Tuesday’ to 2, … ‘Friday’ to 5, so that when I do an ORDER BY, the rows come out exactly from Monday to Friday.

    It seems wasteful to create a lookup table just for this, and at the same time week-days are not variable lists can need to be editable.

    What can I do for this case? At the moment, I am doing the following, and I find it clunky:
    SELECT … FROM … WHERE …
    ORDER BY
    CASE [column_name]
    WHEN ‘Monday’ THEN 1
    WHEN ‘Tuesday’ THEN 2
    WHEN ‘Wednesday’ THEN 3
    WHEN ‘Thursday’ THEN 4
    WHEN ‘Friday’ THEN 5
    ELSE 10 END

    Is there a better method than this?

    Reply
    • David Hogeling
      June 30, 2017 2:05 pm

      Hi, I have an alternative method but it’s limited to 1024 values in the enum. It is still a workaround.

      1. Concept is this:
      –declaration and definition of table-valued function
      CREATE FUNCTION [any_db].AnyEnum ()
      RETURN TABLE
      IS
      RETURN (SELECT null “No Item”
      , 0 “First Item”
      , 1 “Second Item”
      , 1024 “Max Item”
      );

      –usage example 1
      EXEC SomeProcedure @Param1=(SELECT “Second Item” FROM [any_db].AnyEnum())

      –usage example 2
      SELECT * FROM [any_db].SomeItems a JOIN [any_db].AnyEnum() i on a.ItemId = i.”Max Item”

      2. Optional Build this pattern with a sql-script(“template”) from any lookup to come to the full enum-list.

      SELECT script
      FROM
      (
      SELECT ‘ALTER FUNCTION [any_db].AnyEnum ()’ script, 1 linenr
      UNION
      –etc. line by line template for table-valued function declaration, 2 linenr, 3 linenr etc.

      SELECT ‘ , ‘+convert(varchar(10), ItemId)+’ “‘+replace(ItemName,'”‘,'””‘) script, 20 linenr
      FROM [any_db].SomeItems
      –WHERE–logical limitation of more than 1024 values.

      UNION
      SELECT ‘);’ script, 9999 linenr
      ) scr
      ORDER BY scr.linenr

      Build it, run it, copy-paste the script and debug, compare, run the result script. Whatever you decide to do.

      3. Reason to do this: the dependent procedures can be found in SQl Server Management Studio by RMB-click on the [any_db].MyEnum table-valued function and selecting View Dependencies from the RMB-Menu. Gives insight of values that are in use in script code. Removes the magical evil of untraced values: a lookup would still have a char-parameter (single quotes, constant), where a column-name (double quotes) is something called structured.

      4. Easy compare and maintenance with the template.

      Just some creative thought.

      Reply
  • If you are verifying the data in application layer then you use nvarchar(100) directly and there will be no issue. it will be searchable , you can populate select manu from it (select distinct), there will be 0 joins, lesser tables/columns/data in general.

    We need to learn stuff from NoSQL too. people generally use enums for this (in nosql) but it is stored as string nonetheless and it creates no issues, and it is generally used where values are fixed. with your approach you will have to update the code and allow the new value (or an extra select query, which can be a lot of overhead for heavy workload – you potentially double the number of queries).

    the best, smartest way is to use string, and put the allowed values in environment variables/redis.

    Reply
    • using a string is certainly not the best, smartest way. But I guess it is all in the eye of the beholder. To me, in MS SQL, storing an ‘enumerated’ value in the database as a string is pretty much a nono, especially because you don’t know if you add new values if the current field length is still sufficient. Also let’s take multilanguage into account, that’s a big failure if you are storing you enum as a string. Or what if someone wrote a typo in the enum, you now have 2 different values for what should have the same meaning (or you’ll have to update all the existing values in the database). BUT, ofcourse when using enums which are stored in the database, make sure you have the values explicitly declared in your code like (fictious codelanguage):

      Enum MaritalStatus
      Single = 1
      InRelationship = 2
      Married = 3
      Divorced = 4
      End Enum

      And not like:

      Enum MaritalStatus
      Single
      InRelationship
      Married
      Divorced
      End Enum

      Reply
  • That’s not an enumeration, it’s a reference table. You can’t use it to write anything as simple as like: UPDATE Person SET MaritalStatus = CodeValue.Single WHERE LastName = ‘Spinster’. Enumerated values are meant for improving code readability. A reference table is for data integrity, which is different.

    Reply
  • Cevat Alanson
    May 3, 2020 3:22 pm

    Options regarding Marital Status are very limited. If so, why not define it in View instead of a table? The list of options will not change in the future anyway. Instead of a table, for example, a view like this makes more sense. Since this method takes place in memory, it will perform better.

    CREATE VIEW dbo.Status_Marital
    AS
    SELECT 1 as ID, ‘Single’ as Tip UNION
    SELECT 2, ‘In relationship’ UNION
    SELECT 3, ‘Married’ UNION
    SELECT 4, ‘Divorced’
    GO

    Reply
  • This is not proper relational database theory and perpetuates a common bad design pattern. For this example, Marital Status Codes are actually standardized in the HL7 Standards. There are 8 standard definitions. If all the codes are needed, and one wants to abide by the HL7 standard and be prepared if that standard changes, then the 8 values could be stored in their own table. You could also just provide a CHECK constraint to validate the values instead and not use a relation at all. Regardless, the string value for Martial Status is the natural key. There is no need to introduce another arbitrary value to simulate an enumerator or pointer in an RDBMS platform. This design would eliminate the need for a join, which makes queries easier to read and to write.

    Normalization isn’t about moving data into different tables, it’s about eliminating data anomalies like duplication, etc. The fact that a unique constraint is put on the ambiguously labeled “name” column is evidence enough that the constraint is redundant for that table. Let the software engineers have fun with ENUM and pointer chains and leave those concepts out of a relational engine (anti-pattern).

    Reply
    • I guess everybody has a different view on what is proper relational database theory or what constitutes a a bad design pattern. To me the bad design pattern would be to store the value as string in the main record. Especially when you are considering multilanguage representations, like english and spanish. The value in code is still Single or Married, but the value which must be shown to the end user is ‘Single’/’Married’ or ‘Soltero’/’Casado’ (disclaimer, I used google translate for spanish).
      I agree on using single tables for each ‘enum’, that would be the best way to go about it, but it might turn into A LOT of tables. BUT in the case of MaritalStatus I think it would be better to do it like that, as you can have extra fields for visibility (like a value is only/not valid after a specific date so your frontend can take that into account).

      Yes, people say memory/storage/cpu are cheap, but my CC bill at the end of the month says otherwise. So getting as much, easy, optimisations from the start saves me that bill, especially with the rising costs of everything.

      Reply
  • radcapitalist
    April 27, 2021 8:18 pm

    Hey – Could you not enforce the legal values of MaritalStatus using a check constraint on the column?

    Reply

Leave a Reply