One of my friends – a SQL Server MVP- Jacob Sebastian has a knack for coming up with interesting ideas and stuffs, the latest example being SQL Server Puzzles on his blog. Jacob is a regular blogger and a talented writer. I enjoy reading his blogs and books. He has recently published his new book – The Art of XSD – SQL Server XML Schema Collections.
I have based my present article on his most recent brainteaser – Write the shortest T-SQL Code that removes invalid SSN values and returns a result set with only valid SSN values. There are few additional validation rules that have been listed here.
I’m sure after reading the puzzle you all are bitten by the bug to solve it, and I’m certainly not an exception! I have attempted to solve the puzzle and have so far come up with the following shortest code.
SELECT DISTINCT ssn FROM @t JOIN(SELECT LEFT(SSN,3)A,SUBSTRING(SSN,5,2)B,RIGHT(SSN,4)C FROM @t)D ON A+'-'+B+'-'+C=SSN AND(A>'000'AND A<'734'OR A>'749'AND A<'773')AND A<>'666'AND B>'00'AND B<='99'AND C>'0000'AND C<='9999'
The above code has been adjusted below for better readability:
SELECT DISTINCT ssn
(SELECT LEFT(SSN,3) A, SUBSTRING(SSN,5,2) B, RIGHT(SSN,4) C
FROM @t) D
ON A+'-'+B+'-'+C = SSN AND (A > '000' AND A < '734' OR A > '749' AND A < '773')
AND A <> '666' AND B > '00' AND B <= '99' AND C > '0000' AND C <= '9999'
I think my proposed solution is very simple to understand; however, I would still like to give some explanation for your better understanding.
I took the original SSN Number that should be in the format of ‘xxx-xx-xxxx’. I separated all the three parts using CTE (common table expression). After they were separated, I once again attempted to build the original SSN appending ‘-‘ at its appropriate place (
A+'-'+B+'-'+C) i.e. 4th and 7th place. Once I recreated SSN, I compared it with the original SSN (
A+'-'+B+'-'+C = SSN); this comparison validates the format of the SSN. I must say, this was the most difficult part of solving the puzzle.
The next part is comparatively simple. I made sure that each section of SSN was falling within a range of integer and does not accept any other character. Now, instead of comparting all the digits to INT, I casted them as VARCHAR with single quotes around them. I used these VARCHAR casted variables with operator BETWEEN and compared INT range. This was the shortest way to validate INT range.
I would like my readers to participate in this brainstorming session and come up the shortest code.
The following is sample code to generate test table.
DECLARE @t TABLE (SSN VARCHAR(15))
INSERT INTO @t (SSN) SELECT '123-45-6789'
INSERT INTO @t (SSN) SELECT '123-45-67.89'
INSERT INTO @t (SSN) SELECT 'ABC-12-3455'
INSERT INTO @t (SSN) SELECT '123-45-67890'
INSERT INTO @t (SSN) SELECT '123-456789'
INSERT INTO @t (SSN) SELECT ' 123-45-6789'
INSERT INTO @t (SSN) SELECT ' 23-45-6789'
INSERT INTO @t (SSN) SELECT '12345-6789'
INSERT INTO @t (SSN) SELECT '123456789'
INSERT INTO @t (SSN) SELECT '123-12-1234'
INSERT INTO @t (SSN) SELECT '666-12-1234'
INSERT INTO @t (SSN) SELECT '123-12-0000'
INSERT INTO @t (SSN) SELECT '000-12-1234'
INSERT INTO @t (SSN) SELECT '735-12-1234'
INSERT INTO @t (SSN) SELECT '987-12-1234'
INSERT INTO @t (SSN) SELECT '123-65-1234'
INSERT INTO @t (SSN) SELECT '987-65-4321'
INSERT INTO @t (SSN) SELECT '987-65-4351'
Reference : Pinal Dave (https://blog.sqlauthority.com)
I’m reading this from my phone but how’s that code work for:
AND A ‘666’ AND B > ’00’ AND C > ‘0000’
AND ssn NOT LIKE ‘%[^0-9-]%’
There is no need for the upper bound check as we have already verified B and C are exactly 2 and 4 digits respectively. We also need to verify there are no non-digits (other than dashes) in there.
Also, is “DISTINCT” necessary? I don’t see anywhere in the rules where it has to be a unique list, and that does add characters to the final SQL.
Good observation. I like the way you think.
For those who plan to use this in an actual system, it is important to research the issue in more depth. Specifically know that:
1: The “Area Number” list is only one part of the validation, the two digit Group Number is also assigned in a particular order (using a peculiar odd/even group number assignment sequence, see . The last group number in use is published monthly ).
2: On June 25, 2011, all this will change as SSN randomization will be introduced (or at least that is the plan at this time – . The Area Number/Group Number still applies to SSNs issued prior to that time, but new SSNs will not follow that pattern.
3: For serious SSN verification, one should also consider the Social Security Administration’s Death Master File, containing over 65 million thus “expired” SSNs.
Try something like this in sql
See what you get in output. ISNUMERIC function fail.
Its realy helpful to me for working in SSN issue