SQL SERVER – Puzzle – DISTINCT Values in NVARCHAR

The technique used in this blog is explained in earlier blog SQL SERVER – Creating Dataset Using VALUES Clause Without Creating Table. To read more about Collation and Unicode refer to MSDN. This topic in general is very deep and complicated sometimes that you get to learn something every single time.

While playing with SSIS package, I was getting unique constraint error while data were getting inserted in SQL Server. Later I did further digging and found that it was due to German characters. Here is a simplified version of the problem. Will you be able to crack the puzzle?

You can use the SQL Server Management studio and run below three queries:

SET NOCOUNT ON
SELECT DISTINCT
columny COLLATE SQL_Latin1_General_CP1_CI_AS AS 'First'
FROM (VALUES ('ß'),('SS') ) tablex(columny) ;
SELECT DISTINCT columny COLLATE Latin1_General_CI_AS AS 'Second'
FROM (VALUES ('ß'),('SS') ) tablex(columny) ;
SELECT DISTINCT CAST(columny AS NVARCHAR) COLLATE SQL_Latin1_General_CP1_CI_AS AS 'Third'
FROM (VALUES (N'ß'),(N'SS') ) tablex(columny) ;

If you are not having access to SQL Server handy – here is a sample output for them:

SQL SERVER - Puzzle - DISTINCT Values in NVARCHAR ss-puzzle-01

Do you know the reason why ‘ß’ and ‘SS’ are same in nvarchar? Please comment and let me know. I will share surprise learning voucher with you.

Hope we will learn something together.

Reference: Pinal Dave (https://blog.sqlauthority.com)

Previous Post
SQL SERVER – Enhancing Reference Data in Master Data Services – Notes from the Field #089
Next Post
PowerShell: How to Find Logical and Physical Processors on SQL Server?

Related Posts

No results found.

15 Comments. Leave new

  • Benjamin Steinfeld
    July 17, 2015 8:45 am

    ß is a Scharfe S, it’s interchangeable with SS according to Unicode 7.0 Case Folding Rules

    Reply
  • character expansion is used for some independent letters on certain collations like
    ß can be expended to SS
    œ can be expended to OE
    æ can be expended to AE

    thats why we r getting our result like this

    Reply
  • krmanish1986
    July 17, 2015 9:21 am

    can be expended to

    Reply
  • krmanish1986
    July 17, 2015 9:25 am

    there are some independent letters that can be expended while matching using character expansion.. For ex
    ß can be expended to SS
    æ can be expanded to AE
    thats y we r getting our result like this

    Reply
  • Cool Post!

    In Non-Unicode “Latin1_General_CI_AS” expansion of ‘ß’ to ‘SS’ take place for Unicode and Non-Unicode strings.

    In Unicode “SQL_Latin1_General_CP1_CI_AS” expansion of ‘ß’ to ‘SS’ take place only for Unicode strings (N’ß’ and N’SS’).
    For Non-Unicode strings direct comparison of string literals take place (without expansion).
    So for this one it is matter, whether you’re comparing N’ß’ and N’SS’ or ‘ß’ and ‘SS’.

    Cheers.

    Reply
  • Sanjay Monpara
    July 17, 2015 3:43 pm

    Similar issue:
    DECLARE @A VARCHAR(23)=’KW’
    DECLARE @B VARCHAR(23)=’KV’

    IF @A COLLATE Danish_Norwegian_CI_AI =@B COLLATE Danish_Norwegian_CI_AI
    SELECT 1 AS Danish
    ELSE
    SELECT 0 AS Danish

    IF @A COLLATE Finnish_Swedish_100_CI_AI =@B COLLATE Finnish_Swedish_100_CI_AI
    SELECT 1 AS Finnish
    ELSE
    SELECT 0 AS FInnish

    –>–Ourput

    Danish
    ——–
    0

    Finnish
    ———
    1

    Reply
  • select (null+2+3+5) =null
    select sum(salary) FROM TBL_EMPLOYEE =90000
    WHERE SALARY IS 30000,30000,30000 , null FOR 4 EMPLOYEES

    wHY ?

    Reply
    • It is because SUM function ignore NULL values by default

      Reply
      • select SUM(null+2+3+5) =null
        but sir this also results to NULL..

        is this the wrong way of writing query

        or something else is happening here

      • You have used + operator directly on NULL that’s why it is NULL whereas SUM function takes values from each row and ignores NULL

      • Also see what this returns

        select sum(col) as col from
        (
        select null as col
        union
        select 2
        union
        select 3
        union
        select 5
        ) as t

      • Got it sir
        Thanx

  • Hi Due to this it will happen:-

    You should see little difference if the collation is SQL_Latin1_General_CP1_CI_AS or Latin1_General_CI_AS, but both have instances where they are faster or slower than the other.

    Latin1_General_CI_AS :- Latin1-General, case-insensitive, accent- sensitive, kanatype-insensitive, width-insensitive

    SQL_Latin1_General_CP1_CI_AS:- Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data

    Reply
  • SELECT * FROM ::fn_helpcollations()
    WHERE name IN (
    ‘SQL_Latin1_General_CP1_CI_AS’,
    ‘Latin1_General_CI_AS’
    )

    Reply

Leave a Reply