It has been a while since I posted a puzzle on this blog and lots of people have been sending me an email requesting to have a brain teaser. Today, let us see a very simple puzzle where we are getting strange results when using NOT IN and IS NULL in a T-SQL query.
Here look at two queries if you think about it both look pretty much the same. Both the queries are run for sample database WideWorldImporters. You can get this sample database downloaded from here.
Query 1: IN and IS NOT NULL
Run the following query for the sample database WideWorldImporters.
SELECT * FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID IN ( SELECT PickedByPersonID FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID IS NOT NULL) GO
The query above will return 62,972 rows.
Query 2: NOT IN and IS NULL
Run the following query for the sample database WideWorldImporters.
SELECT * FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID NOT IN ( SELECT PickedByPersonID FROM [WideWorldImporters].[Sales].[Orders] WHERE PickedByPersonID IS NULL) GO
The query above will return 0 rows.
Why Strange Results?
Now the question to you is why just moving NOT from a subquery to outer query is making such a huge difference in the result. If you think about it, logically both the queries are the same. If you put any number for outer query or inner subquery, the result will come the same. However, there is one reason, why the result is different.
Here is the question one time – Why such a huge difference in the Resultset by moving NOT keyword?
Leave your answer in the comments section. All the answers will be kept unpublished for one week and after that, they will be all published. I will select three of the individuals randomly who has left a comment on this blog post and will send them free one month’s access to SQL Server Performance Tuning Practical Workshop – Recorded Classes.
If you have any such puzzle, please do send it to my way and I will be happy to publish with due credit to you.
Update (July 9, 2020): The winner of the contest is Mike Mohanty.
Reference: Pinal Dave (https://blog.sqlauthority.com)
117 Comments. Leave new
Null simply means a value does not exist hence a Null can not be equated to another Null. That is the reason for the second query to give 0 out put where the subquery has an out put of Nulls. Ever thought why we use “is Null” and not “=Null“ because Null is never equal to another Null yet it just means the value is empty. This explains the strange output pattern of both queries.
The subquery conditions are different in both cases and hence will return different results. Hence main query will also return different results. JUst because there is a “NOT” in both quesries, doesn’t mean they will return the same results. THe position of “NOT” will change the whole result.
Anything when compared with null won’t be evaluated. It is termed as 3 Value logic. Inner query will return a set of NULL Values.
Anything we compare with NULL values can not be evaluated. It is termed as 3 Value logic. The Inner query will return set of NULL values.
–>–A comparison to NULL is never true or false. Since NULL can never be equal to any value, it can never be unequal, either.
SELECT *
FROM [WideWorldImporters].[Sales].[Orders]
WHERE PickedByPersonID NOT IN (
SELECT ISNULL(PickedByPersonID,”)
FROM [WideWorldImporters].[Sales].[Orders]
WHERE PickedByPersonID IS NULL)
GO
This is because null is not comparable with any other value not even itself. So [null=null] is false, [nullnull] is false, [null=1] is false, [null 1] is false.
NOT IN returns 0 records when compared against an unknown value.
Since NULL is an unknown, a NOT IN query containing a NULL or NULLs in the list of possible values will always return 0 records since there is no way to be sure that the NULL value is not the value being tested.
I believe the IN function requires at least 1 record in the sub query before it can actually validate any records to be true for the IN or NOT IN clause.
If no records are returned no validation can happen and therefore all records are set to false.
In the second case, you are essentially saying NOT IN NULL, but NULL is undefined.
From the SQL Server DOCs on T-SQL operator IN:
” Caution
Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results.”
Sir, this is due to null entries in PickedByPersonId in Orders table. If any of the value is null then it won’t work.
in such cases better to use except.
The question is not so much about moving the NOT from one place to another, but rather what the result of the subquery (i.e. the part in brackets after the IN or NOT IN) is. If this returns an empty result set, then the comparison done by IN or NOT IN has an UNKNOWN result and since that is the only filter for the main query, the overall result will also be empty.
Thus I don’t really think the result is unexpected or strange, but this is a good reminder of something to keep in mind when writing queries!
1st Case : It will surely have some numerical value >=1 hence all rows are returned
2nd Case : It will not have any value as pickedbypersonid is selected as NULL and not in of NULL will be a NULL.
This occurs when the PickedByPersonId is NULL in all the rows available in the table.
In case of IN (NOT NULL) it evaluates to IN(NULL) as there are NULL values of NOT NULL in table.
In case of NOT IN(IS NULL), it evaluates to NOT IN(NULL) which is zero records in the table.
By moving the not you are forcing the statement to compare null values with the in statement. Since you cant compare with a null value, it finds nothing that matches the in statement.
SET ANSI NULL property is set incorrectly. At the current setting they cannot be compared. ColumXXX NULL equates to unknown, thus no results.
When evaluating the IN clause, equality must evaluate to true. Comparing anything to NULL, it evalutates as unknown. NULL is define as missing or unknown value. By this definition, NULL cannot be evaluated as being equal to NULL and can only result in unknown.
Nothing matches NULL, and nothing also doesn’t match NULL. ?
The 2nd subquery returns a series of NULL values and the first subquery returns a series of known values. IN and NOT IN evaluate each value in the series for find all true conditions. NULL has a unknown value therefore cannot be evaluated.
The issue is IS NOT NULL is very different from != NULL and that is why you are facing this situation.
Two NULL Values can’be compared as well in SQL Server
For example following result will not return the same values
If you want to change the behavior of how NULL works, you may want to change by writing the following set statement on the top of the query:
Now NULL will be compared properly. Try out your query too and you will find it also returns the same results.