It has been a while since I posted a puzzle on this blog and lots of people have been sending me an email requesting to have a brain teaser. Today, let us see a very simple puzzle where we are getting strange results when using NOT IN and IS NULL in a T-SQL query.
Here look at two queries if you think about it both look pretty much the same. Both the queries are run for sample database WideWorldImporters. You can get this sample database downloaded from here.
Query 1: IN and IS NOT NULL
Run the following query for the sample database WideWorldImporters.
SELECT * FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID IN ( SELECT PickedByPersonID FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID IS NOT NULL) GO
The query above will return 62,972 rows.
Query 2: NOT IN and IS NULL
Run the following query for the sample database WideWorldImporters.
SELECT * FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID NOT IN ( SELECT PickedByPersonID FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID IS NULL) GO
The query above will return 0 rows.
Why Strange Results?
Now the question to you is why just moving NOT from a subquery to outer query is making such a huge difference in the result. If you think about it, logically both the queries are the same. If you put any number for outer query or inner subquery, the result will come the same. However, there is one reason, why the result is different.
Here is the question one time – Why such a huge difference in the Resultset by moving NOT keyword?
Leave your answer in the comments section. All the answers will be kept unpublished for one week and after that, they will be all published. I will select three of the individuals randomly who has left a comment on this blog post and will send them free one month’s access to SQL Server Performance Tuning Practical Workshop – Recorded Classes.
If you have any such puzzle, please do send it to my way and I will be happy to publish with due credit to you.
Update (July 9, 2020): The winner of the contest is Mike Mohanty.Â
Reference:Â Pinal Dave (https://blog.sqlauthority.com)
117 Comments. Leave new
In the second query, the subquery returns NULL values. By default, any logical operations done against NULL will return false in SQL SERVER. Thus, the NOT IN logical operator returned false because the operands returned by the subquery are all NULL.
In the first query you are using PickedByPersonID to compare to all returned PickedByPersonID.
In the second query you are comparing a null value. Nulls have no value to compare.
A comparison to NULL can only be done with IS NULL or IS NOT NULL.
When using = NULL or IN (NULL), the comparison will not return anything.
In the first query, the subquery will give you a list of definite values for ‘PickedByPersonID’ which are not null. The outer query will then be able to make a match between all those records that have those same values and produce a list of 62,972 records.
In the second query however, the subquery will just produce a list of nulls, which the outer query is unable to relate to because NULL is not considered to be a value, so the ‘PickedByPersonID’ field is unable to make a match and so produces an empty list with no records.
because ansi_nulls is on in this case 1 != null is unknown and if we set ansi_null off 1 != null is will give res. Thats why the behaviour
An IN statement will be parsed identically to field=val1 or field=val2 or field=val3. Putting a null in there will boil down to field=null which won’t work. So, PickedByPersonID IS NOT IN(NULL), is equal to PickedByPersonID NULL which will not.
But in the first query, you first filtered the NULLs, so then you have PickedByPersonID IN (1,2,3), which is equal to PickedByPersonID=1 OR PickedByPersonID =2 OR PickedByPersonID=3 which is correct and will brings you the desired result.
Neither IN, nor NOT IN can return anything from nothing. Not in is checking against a subquery that’s resolving to nothing.
In the 1st query you are comparing Integer to Integer, in the 2nd query you are comparing INT to Null values which is not comparable thus 0 results.
Null is Null – The first query is performing an explicit NULL check where the second is doing a compare. A compare will always return false regardless when one of the values is a NULL.
IN (and NOT IN) do not work with null. They actually are translated to OR equalities (or inequalities). Equalities do not work with null.
A NOT IN (B,C) is similar to: A B OR AC (B,C cannot be null)
“not in” tries to make an equivalent, but you can’t make an equivalent on a null value. the second query returns null as the id.
It is a simple point that IN OR NOT IN dose not compares NULL value. 2nd statement giving PickedByPersonID as null value as per condition which there reason to not get any record while 1st statement are giving not null PickedByPersonID .
First one is not null it means the coloum which don’t have null value
And is null which have null value
In the first case, you’re bringing back a list of PickedByPersonID’s from the sub-select, so the end result is seeing all rows that have already been picked.
In the second case, you’re only returning NULL from the sub-select. A NOT IN clause with NULL always evaluates to an unknown or undefined status. It doesn’t know what to do, so it does nothing.
Reason In is a Set operator and it does not support the Null values comparison. that’s the reason when your subquery returning Null values its not getting compared with In.
Hello, as said in ms docs : “Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results.”
Thanks for this puzzle.
Hi Pinal,
Good Evening…
According to my understanding … in second query which is returns 0 rows having inner query with where condition PickedbypersonId IS NULL this will return 0 rows i.e. PickedbyPersonId having null values could be 0 rows in that table.. So next the outer query is looking for PickdebyPersonId NOT IN conditions which is looking for records with NOT NULL ideally hence returns 0 rows.
Ans :Because of SQL Operator Left Anti semi joins which gives only those rows in the left rowset that have no matching row in the right rowset (Row by row). The second rowset outputs 0 records because the nested loop (Left antisemi join) operation. Where as the 1st query uses a hash join (rowset)
The second query returns no rows regardless if the first IN of that query is predicated by a NOT or not. Because it compares PickedByPersonId to a subquery where PickedByPersonID must always be NULL, this is the equivalent of saying “give me all rows where the value does or does not match unknown.” The answer is unknown so SQL returns nothing. This is why the only way to compare a null is to us IS NULL or IS NOT NULL.
The Subquery in #2 is only returning Null values. And Null has to be handled differently because NULL is nonexistent so there’s not any data to process with.
NULL = NULL, NULL NULL, NULL NULL, etc. NUL can’t compared in this fashion