SQL SERVER – Solution to Puzzle – Shortest Code to Perform SSN Validation

One of my friends – a SQL Server MVP- Jacob Sebastian has a knack for coming up with interesting ideas and stuffs, the latest example being SQL Server Puzzles on his blog. Jacob is a regular blogger and a talented writer. I enjoy reading his blogs and books. He has recently published his new book – The Art of XSD – SQL Server XML Schema Collections.

I have based my present article on his most recent brainteaser – Write the shortest T-SQL Code that removes invalid SSN values and returns a result set with only valid SSN values. There are few additional validation rules that have been listed here.

I’m sure after reading the puzzle you all are bitten by the bug to solve it, and I’m certainly not an exception! I have attempted to solve the puzzle and have so far come up with the following shortest code.

SELECT DISTINCT ssn FROM @t JOIN(SELECT LEFT(SSN,3)A,SUBSTRING(SSN,5,2)B,RIGHT(SSN,4)C FROM @t)D ON A+'-'+B+'-'+C=SSN AND(A>'000'AND A<'734'OR A>'749'AND A<'773')AND A<>'666'AND B>'00'AND B<='99'AND C>'0000'AND C<='9999'

The above code has been adjusted below for better readability:

SELECT DISTINCT ssn
FROM @t
JOIN
(SELECT LEFT(SSN,3) A, SUBSTRING(SSN,5,2) B, RIGHT(SSN,4) C
FROM @t) D
ON A+'-'+B+'-'+C = SSN AND (A > '000' AND A < '734' OR A > '749' AND A < '773')
AND
A <> '666' AND B > '00' AND B <= '99' AND C > '0000' AND C <= '9999'

I think my proposed solution is very simple to understand; however, I would still like to give some explanation for your better understanding.

I took the original SSN Number that should be in the format of ‘xxx-xx-xxxx’. I separated all the three parts using CTE (common table expression). After they were separated, I once again attempted to build the original SSN appending ‘-’ at its appropriate place (A+'-'+B+'-'+C) i.e. 4th and 7th place. Once I recreated SSN, I compared it with the original SSN (A+'-'+B+'-'+C = SSN); this comparison validates the format of the SSN. I must say, this was the most difficult part of solving the puzzle.

The next part is comparatively simple. I  made sure that each section of SSN was falling within a range of integer and does not accept any other character. Now, instead of comparting all the digits to INT, I casted them as VARCHAR with single quotes around them. I used these VARCHAR casted variables with operator BETWEEN and compared INT range. This was the shortest way to validate INT range.

I would like my readers to participate in this brainstorming session and come up the shortest code.

The following is sample code to generate test table.

DECLARE @t TABLE (SSN VARCHAR(15))
INSERT INTO @t (SSN) SELECT '123-45-6789'
INSERT INTO @t (SSN) SELECT '123-45-67.89'
INSERT INTO @t (SSN) SELECT 'ABC-12-3455'
INSERT INTO @t (SSN) SELECT '123-45-67890'
INSERT INTO @t (SSN) SELECT '123-456789'
INSERT INTO @t (SSN) SELECT ' 123-45-6789'
INSERT INTO @t (SSN) SELECT ' 23-45-6789'
INSERT INTO @t (SSN) SELECT '12345-6789'
INSERT INTO @t (SSN) SELECT '123456789'
INSERT INTO @t (SSN) SELECT '123-12-1234'
INSERT INTO @t (SSN) SELECT '666-12-1234'
INSERT INTO @t (SSN) SELECT '123-12-0000'
INSERT INTO @t (SSN) SELECT '000-12-1234'
INSERT INTO @t (SSN) SELECT '735-12-1234'
INSERT INTO @t (SSN) SELECT '987-12-1234'
INSERT INTO @t (SSN) SELECT '123-65-1234'
INSERT INTO @t (SSN) SELECT '987-65-4321'
INSERT INTO @t (SSN) SELECT '987-65-4351'

Reference : Pinal Dave (http://blog.SQLAuthority.com)

About these ads

7 thoughts on “SQL SERVER – Solution to Puzzle – Shortest Code to Perform SSN Validation

  1. AND A ’666′ AND B > ’00′ AND C > ’0000′
    AND ssn NOT LIKE ‘%[^0-9-]%’

    There is no need for the upper bound check as we have already verified B and C are exactly 2 and 4 digits respectively. We also need to verify there are no non-digits (other than dashes) in there.

    Also, is “DISTINCT” necessary? I don’t see anywhere in the rules where it has to be a unique list, and that does add characters to the final SQL.

  2. For those who plan to use this in an actual system, it is important to research the issue in more depth. Specifically know that:
    1: The “Area Number” list is only one part of the validation, the two digit Group Number is also assigned in a particular order (using a peculiar odd/even group number assignment sequence, see http://www.socialsecurity.gov/history/ssn/geocard.html). The last group number in use is published monthly (http://www.socialsecurity.gov/employer/ssnvhighgroup.htm).
    2: On June 25, 2011, all this will change as SSN randomization will be introduced (or at least that is the plan at this time – http://www.socialsecurity.gov/employer/randomizationfaqs.html). The Area Number/Group Number still applies to SSNs issued prior to that time, but new SSNs will not follow that pattern.
    3: For serious SSN verification, one should also consider the Social Security Administration’s Death Master File, containing over 65 million thus “expired” SSNs.

  3. Pinal,

    Try something like this in sql

    SELECT ISNUMERIC(’33D3′);
    SELECT ISNUMERIC(’33E3′)

    See what you get in output. ISNUMERIC function fail.

  4. Pingback: SQL SERVER – Weekly Series – Memory Lane – #027 | SQL Server Journey with SQL Authority

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s