It has been a while since I posted a puzzle on this blog and lots of people have been sending me an email requesting to have a brain teaser. Today, let us see a very simple puzzle where we are getting strange results when using NOT IN and IS NULL in a T-SQL query.
Here look at two queries if you think about it both look pretty much the same. Both the queries are run for sample database WideWorldImporters. You can get this sample database downloaded from here.
Query 1: IN and IS NOT NULL
Run the following query for the sample database WideWorldImporters.
SELECT * FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID IN ( SELECT PickedByPersonID FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID IS NOT NULL) GO
The query above will return 62,972 rows.
Query 2: NOT IN and IS NULL
Run the following query for the sample database WideWorldImporters.
SELECT * FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID NOT IN ( SELECT PickedByPersonID FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID IS NULL) GO
The query above will return 0 rows.
Why Strange Results?
Now the question to you is why just moving NOT from a subquery to outer query is making such a huge difference in the result. If you think about it, logically both the queries are the same. If you put any number for outer query or inner subquery, the result will come the same. However, there is one reason, why the result is different.
Here is the question one time – Why such a huge difference in the Resultset by moving NOT keyword?
Leave your answer in the comments section. All the answers will be kept unpublished for one week and after that, they will be all published. I will select three of the individuals randomly who has left a comment on this blog post and will send them free one month’s access to SQL Server Performance Tuning Practical Workshop – Recorded Classes.
If you have any such puzzle, please do send it to my way and I will be happy to publish with due credit to you.
Update (July 9, 2020): The winner of the contest is Mike Mohanty.Â
Reference:Â Pinal Dave (https://blog.sqlauthority.com)
117 Comments. Leave new
For 1st query where we are using NOT NULL for inner sub query since the inner sub query is returning some rows so there is no problem of using IN operator.
Since IN is a logical operator even though if the sub query will pass NULL value also it will give a reasonable response. Since there are 62972 rows which
are not null so there is no problem here.
But NOT IN behave very differently. If a sub query can return a NULL then NOT IN returns no rows at all.Here NULL really means unknown rather than nothing
That’s why when an expression that compare to NULL value will return NULL or unknown.
When SQL server execute the 2nd query SQL server evaluates the sub-query first, replace it with the list of values it returns & then evaluates the NOT IN
condition. Since for 2nd query the sub-query returns the NULL value & when the expression compares with NULL value it will always return NULL.That’s why the
outer query produces 0 rows.
In-order to over come this problem we can use NOT EXISTS which will produce the same number of rows ie. 62972.
first query – the inner query produce values other than NULL, and the outer query asked that the valued selected will be part of those inner-query-values
second query – the inner query produces only NULL values and the outer query ask the equivalent of
x NULL
since NULL is undefined, the query can’t say for certain that any of the outer-query values is definitely not in the inner-query. and thus produces 0 results
SQL SERVER executes innermost sub query first, the result of inner most sub query in second query is NULL. SQL Will consider this as an unknown values. Which causes the difference in results
This is b’coz, a sub query first execute and provide its output to the outer query. In the first query, a sub query is trying to fetch the list of records which are not null. Then these list then provided to outer query to verify the condition using ‘IN’ operator.
In the second query, a sub query is trying to first fetch the records which are ‘null’, then these null records are provided to the outer query to check the condition with ‘NOT IN’ operator. Here, outer query cannot able to find the records in NULL records and therefore it is not giving any output.
Set Ansi_Nulls Off
SELECT COUNT(*) as TotalCount
FROM [WideWorldImporters].[Sales].[Orders]
WHERE PickedByPersonID NOT IN (
SELECT PickedByPersonID
FROM [WideWorldImporters].[Sales].[Orders]
WHERE PickedByPersonID IS NULL)
GO
The sql statements are asking for two different things. The one that returns the 62+ thousand results is asking for all the Primary key’s that are not null in the Primary Key column, which should be all of them. The other statement is asking for all the Primary Keys’s that are null, which should be 0 considering it’s a primary key
Inner query result is having only NULL values.
Second query uses NOT IN which does not work with NULL (comparison with NULL values fails).
For second query to work correctly simply put IS NOT NULL and modified as below…
SELECT *
FROM [WideWorldImporters].[Sales].[Orders]
WHERE PickedByPersonID IS NOT NULL
In the second query which is having IS NULL sub-query, sub-query will return one column raw will have only “NULL” data.
So we can rewrite 2nd Query like below:
Original Query: select * from XYZ where ID not in (select ID from XYZ where ID is NULL)
Modified Query: select * from XYZ where ID not in (NULL)
So from above Modified Query sub-query has nothing, so result seems 0.
The first query works since it is getting a list of NOT NULL values in the subquery and evaluating those to the outer query. The second query returns 0 rows, because the subquery would be returning a NULL value and the NOT IN operator comparing PickedByPersonID to NULL would result in an UNDEFINED result. You can’t compare NULL for equals or not equals.
because null can be compared with is operator.
The difference is based on the three-valued logic of SQL.
The first script has the filter clause choosing data (IN) from the subquery’s resultset, which is existing data.
The second script is may be geared to something intuitive to human thinking (contrasting subgroups of absent |vs| non-absent), but ends up being counter-intuitive when taking into account three-valued logic.
In three-valued logic, only TRUE and FALSE are polar opposites … while NULL doesn’t have an opposite.
So, though the filter clause may aim for the opposite of absent (NULL) data and thereby thinking to pull from rows having non-absent data, instead the WHERE clause follows the logic & pulls from a dataset of 0 rows.
NULL matches nothing not even “NOT” itself.
The big difference is because the second subquery simply yields NULL and the Outer Query becomes NOT IN ( NULL) and it is not the right way to compare against special absence of value ‘NULL’ because it yields and empty result, finally the query must be wrote using IS NOT NULL.
In the above puzzle, the second query is comparing pickedByPersonID with null. comparing anything with null will return null, not true or false. so the “NOT IN” condition does not return True and no pickedByPersonID’s are selected to be returned.
its about your reference to Null Values if SET ANSI_NULLS ON
PickedByPersonID null is UNKNOWN, so the predicate evaluates to UNKNOWN, and you don’t get any rows.
if SET ANSI_NULLS is off you will get Rows :)
The second query will return an incorrect number of rows because the embedded query returns NULL values in PickedPersonID.
Because subquery of NOT IN returns NULL. One or more values of PickedByPerson is null in Orders table.
NULLs are unknown so you don’t know when a value is in or not in NULL. The IN uses an equality match whereas IS NULL or IS NOT NULL doesn’t.
In tue second query, the inner query will pick up the record where PickedPersonById will be NULL. Since NULL is not any kind of value, so it will void the outer query to get the correct records.
SQL Puzzle – IN and IS NOT NULL – Strange Results
whenever you compare will null value it will not give you result in join or where clause, if you want result
Below Query result is nothing
SELECT PickedByPersonID FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID IS NULL
If we write below code then will get same result
SELECT *
FROM [WideWorldImporters].[Sales].[Orders]
WHERE ISNULL(PickedByPersonID,0) NOT IN (0)
If table same then we write
SELECT *
FROM [WideWorldImporters].[Sales].[Orders]
WHERE PickedByPersonID IS NOT NULL