This article has been submitted by Marko Parkkola, Data systems designer at Saarionen Oy, Finland. Marko is excellent developer and always thinking at next level. You can read his earlier comment which created very interesting discussion here: SQL SERVER- IF EXISTS(Select null from table) vs IF EXISTS(Select 1 from table). I must express my special thanks to Marko for sending this best practice for Enumerations in Relational Database. He has really wrote excellent piece here and welcome comments here.
Enumerations in Relational Database
This is a subject which is very basic thing in relational databases but often not very well understood and sometimes badly implemented. There are of course many ways to do this but I concentrate only two cases, one which is “the right way” and one which is definitely wrong way.
Let’s say we have table Person in our database. Person has properties/fields like Firstname, Lastname, Birthday and so on. Then there’s a field that tells person’s marital status and let’s name it the same way; MaritalStatus.
Now MaritalStatus is an enumeration. In C# I would definitely make it an enumeration with values likes Single, InRelationship, Married, Divorced. Now here comes the problem, SQL doesn’t have enumerations.
The wrong way
This is, in my opinion, absolutely the wrong way to do this. It has one upside though; you’ll see the enumeration’s description instantly when you do simple SELECT query and you don’t have to deal with mysterious values. There’s plenty of downsides too and one would be database fragmentation.
Consider this (I’ve left all indexes and constraints out of the query on purpose).
CREATE TABLE [dbo].[Person] (
You have nvarchar(20) field in the table that tells the marital status. Obvious problem with this is that what if you create a new value which doesn’t fit into 20 characters? You’ll have to come and alter the table. There are other problems also but I’ll leave those for the reader to think about.
The correct way
Here’s how I’ve done this in many projects. This model still has one problem but it can be alleviated in the application layer or with CHECK constraints if you like.
First I will create a namespace table which tells the name of the enumeration. I will add one row to it too. I’ll write all the indexes and constraints here too.
CREATE TABLE [CodeNamespace] (
[Id] INT IDENTITY(1, 1),
[Name] NVARCHAR(100) NOT NULL,
CONSTRAINT [PK_CodeNamespace] PRIMARY KEY ([Id]),
CONSTRAINT [IXQ_CodeNamespace_Name] UNIQUE NONCLUSTERED ([Name])
INSERT INTO [CodeNamespace] SELECT 'MaritalStatus'
Then I create a table that holds the actual values and which reference to namespace table in order to group the values under different namespaces. I’ll add couple of rows here too.
CREATE TABLE [CodeValue] (
[CodeNamespaceId] INT NOT NULL,
[Value] INT NOT NULL,
[Description] NVARCHAR(100) NOT NULL,
CONSTRAINT [PK_CodeValue] PRIMARY KEY CLUSTERED ([CodeNamespaceId], [Value]),
CONSTRAINT [FK_CodeValue_CodeNamespace] FOREIGN KEY ([CodeNamespaceId]) REFERENCES [CodeNamespace] ([Id])
-- 1 is the 'MaritalStatus' namespace
INSERT INTO [CodeValue] SELECT 1, 1, 'Single', 1
INSERT INTO [CodeValue] SELECT 1, 2, 'In relationship', 2
INSERT INTO [CodeValue] SELECT 1, 3, 'Married', 3
INSERT INTO [CodeValue] SELECT 1, 4, 'Divorced', 4
Now there’s four columns in CodeValue table. CodeNamespaceId tells under which namespace values belongs to. Value tells the enumeration value which is used in Person table (I’ll show how this is done below). Description tells what the value means. You can use this, for example, column in UI’s combo box. OrderBy tells if the values needs to be ordered in some way when displayed in the UI.
And here’s the Person table again now with correct columns. I’ll add one row here to show how enumerations are to be used.
CREATE TABLE [dbo].[Person] (
INSERT INTO [Person] SELECT 'Marko', 'Parkkola', '1977-03-04', 3
Now I said earlier that there is one problem with this. MaritalStatus column doesn’t have any database enforced relationship to the CodeValue table so you can enter any value you like into this field. I’ve solved this problem in the application layer by selecting all the values from the CodeValue table and put them into a combobox / dropdownlist (with Value field as value and Description as text) so the end user can’t enter any illegal values; and of course I’ll check the entered value in data access layer also.
I said in the “The wrong way” section that there is one benefit to it. In fact, you can have the same benefit here by using a simple view, which I schema bound so you can even index it if you like.
CREATE VIEW [dbo].[Person_v] WITH SCHEMABINDING
SELECT p.[Firstname], p.[Lastname], p.[BirthDay], c.[Description] MaritalStatus
FROM [dbo].[Person] p
JOIN [dbo].[CodeValue] c ON p.[MaritalStatus] = c.[Value] JOIN [dbo].[CodeNamespace] n ON n.[Id] = c.[CodeNamespaceId] AND n.[Name] = 'MaritalStatus'
-- Select from View
FROM [dbo].[Person_v] GO
This is excellent write up byMarko Parkkola. Do you have this kind of design setup at your organization? Let us know your opinion.
Reference: Pinal Dave (https://blog.sqlauthority.com)
very nice, I hope I came up with idea
I went for the per table for each namespace :(
Beautiful!!. Thanks for sharing.
If you implement it that way, then just make the value the primary key, not the namespaceid/value. If you require the namespaceid/value pairing for efficient queries on a specific namespace then this isn’t the right implementation.
This type of table is great when you expect to have universal searches over multiple attributes and can also be a good way of keeping text searchable items segregated in a specialized table. If you don’t have specific requirements that make this useful then I don’t think its the right call though. It requires an extra query at insert time. Every join is to a larger table. It’s very easy to implement inefficiently. Your code will be harder to read and understand.
OK, good point Wes and good reply Marko.
However, I have a need to relate ‘Monday’ to 1, ‘Tuesday’ to 2, … ‘Friday’ to 5, so that when I do an ORDER BY, the rows come out exactly from Monday to Friday.
It seems wasteful to create a lookup table just for this, and at the same time week-days are not variable lists can need to be editable.
What can I do for this case? At the moment, I am doing the following, and I find it clunky:
SELECT … FROM … WHERE …
WHEN ‘Monday’ THEN 1
WHEN ‘Tuesday’ THEN 2
WHEN ‘Wednesday’ THEN 3
WHEN ‘Thursday’ THEN 4
WHEN ‘Friday’ THEN 5
ELSE 10 END
Is there a better method than this?
Hi, I have an alternative method but it’s limited to 1024 values in the enum. It is still a workaround.
1. Concept is this:
–declaration and definition of table-valued function
CREATE FUNCTION [any_db].AnyEnum ()
RETURN (SELECT null “No Item”
, 0 “First Item”
, 1 “Second Item”
, 1024 “Max Item”
–usage example 1
EXEC SomeProcedure @Param1=(SELECT “Second Item” FROM [any_db].AnyEnum())
–usage example 2
SELECT * FROM [any_db].SomeItems a JOIN [any_db].AnyEnum() i on a.ItemId = i.”Max Item”
2. Optional Build this pattern with a sql-script(“template”) from any lookup to come to the full enum-list.
SELECT ‘ALTER FUNCTION [any_db].AnyEnum ()’ script, 1 linenr
–etc. line by line template for table-valued function declaration, 2 linenr, 3 linenr etc.
SELECT ‘ , ‘+convert(varchar(10), ItemId)+’ “‘+replace(ItemName,'”‘,'””‘) script, 20 linenr
–WHERE–logical limitation of more than 1024 values.
SELECT ‘);’ script, 9999 linenr
ORDER BY scr.linenr
Build it, run it, copy-paste the script and debug, compare, run the result script. Whatever you decide to do.
3. Reason to do this: the dependent procedures can be found in SQl Server Management Studio by RMB-click on the [any_db].MyEnum table-valued function and selecting View Dependencies from the RMB-Menu. Gives insight of values that are in use in script code. Removes the magical evil of untraced values: a lookup would still have a char-parameter (single quotes, constant), where a column-name (double quotes) is something called structured.
4. Easy compare and maintenance with the template.
Just some creative thought.
“returnS table” sorry forgot the S.
If you are verifying the data in application layer then you use nvarchar(100) directly and there will be no issue. it will be searchable , you can populate select manu from it (select distinct), there will be 0 joins, lesser tables/columns/data in general.
We need to learn stuff from NoSQL too. people generally use enums for this (in nosql) but it is stored as string nonetheless and it creates no issues, and it is generally used where values are fixed. with your approach you will have to update the code and allow the new value (or an extra select query, which can be a lot of overhead for heavy workload – you potentially double the number of queries).
the best, smartest way is to use string, and put the allowed values in environment variables/redis.
That’s not an enumeration, it’s a reference table. You can’t use it to write anything as simple as like: UPDATE Person SET MaritalStatus = CodeValue.Single WHERE LastName = ‘Spinster’. Enumerated values are meant for improving code readability. A reference table is for data integrity, which is different.
Options regarding Marital Status are very limited. If so, why not define it in View instead of a table? The list of options will not change in the future anyway. Instead of a table, for example, a view like this makes more sense. Since this method takes place in memory, it will perform better.
CREATE VIEW dbo.Status_Marital
SELECT 1 as ID, ‘Single’ as Tip UNION
SELECT 2, ‘In relationship’ UNION
SELECT 3, ‘Married’ UNION
SELECT 4, ‘Divorced’
This is not proper relational database theory and perpetuates a common bad design pattern. For this example, Marital Status Codes are actually standardized in the HL7 Standards. There are 8 standard definitions. If all the codes are needed, and one wants to abide by the HL7 standard and be prepared if that standard changes, then the 8 values could be stored in their own table. You could also just provide a CHECK constraint to validate the values instead and not use a relation at all. Regardless, the string value for Martial Status is the natural key. There is no need to introduce another arbitrary value to simulate an enumerator or pointer in an RDBMS platform. This design would eliminate the need for a join, which makes queries easier to read and to write.
Normalization isn’t about moving data into different tables, it’s about eliminating data anomalies like duplication, etc. The fact that a unique constraint is put on the ambiguously labeled “name” column is evidence enough that the constraint is redundant for that table. Let the software engineers have fun with ENUM and pointer chains and leave those concepts out of a relational engine (anti-pattern).
Hey – Could you not enforce the legal values of MaritalStatus using a check constraint on the column?
Oops, you mentioned the use of check constraints right there in the article!