SQL SERVER – SELECT 1 vs SELECT * – An Interesting Observation

Many times I have seen issue of SELECT 1 vs SELECT * discussed in terms of performance or readability while checking for existence of rows in table. I ran quick 4 tests about this observed that I am getting same result when used SELECT 1 and SELECT *. I think smart readers of this blog will come up the situation when SELECT 1 and SELECT * have different execution plan when used to find existence of rows.

Let us see 4 tests I ran and note their result is same and their Execution Plan take same amount of resources (50% and 50%)

You can click on images to see larger images.

Test 1 : Whole Table SELECT

USE AdventureWorks
GO
IF EXISTS(
SELECT 1
FROM Production.Product)
SELECT 'SELECT 1'
GO
IF EXISTS(
SELECT *
FROM Production.Product)
SELECT 'SELECT *'
GO

Test 2 : Condition WHERE on Indexed Column

USE AdventureWorks
GO
IF EXISTS(
SELECT 1
FROM Production.Product
WHERE Name = 'Adjustable Race')
SELECT 'SELECT 1'
GO
IF EXISTS(
SELECT *
FROM Production.Product
WHERE Name = 'Adjustable Race')
SELECT 'SELECT *'
GO

Test 3 : Using Aggregate function COUNT

USE AdventureWorks
GO
IF (
SELECT 1
FROM Production.Product
WHERE Name = 'Adjustable Race') = 1
SELECT 'SELECT 1'
GO
IF (
SELECT COUNT(*)
FROM Production.Product
WHERE Name = 'Adjustable Race') = 1
SELECT 'SELECT *'
GO

Test 4 : Using COUNT with search on non-indexed Column

USE AdventureWorks
GO
IF (
SELECT COUNT(1)
FROM Production.Product
WHERE SafetyStockLevel = '800') > 1
SELECT 'SELECT 1'
GO
IF (
SELECT COUNT(*)
FROM Production.Product
WHERE SafetyStockLevel = '800') > 1
SELECT 'SELECT *'
GO

Note: I have used single quotes around 800, which is numeric and I am aware of that fact. Above queries are for testing purpose only.

I have been using SELECT 1 instead of SELECT * when checking existence of rows. I would like to see what my readers have opinion about this. Please have your opinion and make your comment here.

Reference : Pinal Dave (http://blog.SQLAuthority.com)

About these ads

46 thoughts on “SQL SERVER – SELECT 1 vs SELECT * – An Interesting Observation

  1. I read about this as well over on

    http://www.sqlskills.com/blogs/conor/2008/02/07/EXISTSSubqueriesSELECT1VsSELECT.aspx

    However my own testing, both when viewing execution plans and using Statistic timer to see compile/execution time, is that I can’t find consistent difference between using SELECT * or SELECT 1 when combined with an EXISTS clause/condition

    (even when using DROPCLEANBUFFERS and FREEPROCCACHE between each run, which as far as I understand should clear the cache and force a recompilation of my queries).

    I’m glad somebody else have taken this subject up as well, as I am strained for time to play myself at the moment – and it is one of those “myths” within SQL currently, that I’m having trouble finding out if is true or not.

    Like

  2. If you try this with a view that return columns from functions or subquieries

    ie

    select c1,c2,(select tc from tbl2 where tbl2.t1 =tbl.t1) as tc3
    from tbl

    the select 1 will not execute the (select tc from tbl2 where tbl2.t1 =tbl.t1 ) sub query. This is a significant performance gain my view runs 99% slower with select *

    Like

  3. The problem with SELECT * is it retrieves all the meta data associated to the object(s) in question BEFORE runtime. So during compilation of the query there is unneeded overhead associated to it – but at run time the query plans will be the same.

    Just because the query plans are the same does not mean that the queries perform the same – it is compile time that is taking the hit in this case, and not run time.

    So you’re doing yourself a compilation favor if you use SELECT 1 rather than SELECT *, even if you don’t see the gains in the execution plan – as I said they will probably be exactly the same. Just keep in mind that meta data is being queried and returned, then discarded because it is unneeded to get the results of you query. Why thrash the system tables when you don’t need to?

    When you go to the store to get a bottle of coke, you don’t buy all the coke products in the store and throw out those you didn’t want – it’s the same thing here. Get what you need from the beginning – it costs less.

    Like

  4. I don’t think there is any difference, as long as the SELECT 1/* is inside EXISTS, which really doesn’t return any rows – it just returns boolean as soon as condition of the WHERE is checked.

    I’m quite sure that the SQL Server Query Optimizer is smart enough not to search for the unneeded meta data in the case of EXISTS.

    I agree that in all the other situations SELECT * shouldn’t be used for the reasons Simon mentioned. Also, index usage wouldn’t be optimal etc.

    For me EXISTS (SELECT * ..) is the only place where I allow myself to write SELECT * in production code ;)

    Like

  5. @simon worth:
    By running statistics time on I’ve not seen any consistent indication that at compile time the query will collect all the metadata and thus have a higher compile time them then a SELECT 1. Some times it was faster, other times it was not.

    Without looking into the engine I would also seriously think it would be bad implementation to retrieve any sort of meta data when the SELECT * is within a conditional clause, like EXISTS. The engine would be able to see from the syntax alone, that there is no usage for the specific data and thus no need to look up any additional metadata, other then testing whether the statement evaluates true or not.

    I’d be very interested if you had some code which could show this behaviour. Or some links which have shown this to be true. Because I can’t see it myself, neither logical nor running statistic on the queries.

    Like

  6. It seems that this is primarily a semantic argument in terms of runtime performance. Personally, I’ve always used SELECT 1 because I wasn’t sure about the implications of the alternative.

    If it comes down to little more than personal preference, I’m going to continue using SELECT 1 because it reinforces the idea that SELECT * is (usually) bad.

    Like

  7. If you want to read more about the meta data collection on objects when the query is executed – read the article that Allan Svelmoe Hansen posted in the first comment. Conor used to work (I don’t think he still does) for Microsoft on the Query Processor team. If anyone would know the inside of the query processor it’s him. In that blog post he states

    “The * will be expanded to some potentially big column list and then it will be determined that the semantics of the EXISTS does not require any of those columns, so basically all of them can be removed.

    “SELECT 1″ will avoid having to examine any unneeded metadata for that table during query compilation.”

    Source : http://www.sqlskills.com/blogs/conor/2008/02/07/EXISTSSubqueriesSELECT1VsSELECT.aspx

    Like

  8. Simon,

    We can not take anybodies word for it. (With due respect to Conor – I know he is the man). Is there any proof for this?

    -Guy

    Like

  9. Honestly I don’t see what the importance of this subject is.

    People have been saying for a long time that SELECT * in an exists condition has unneeded overhead – and there are several blog posts out there to back up that claim.

    If you really really want to use SELECT * instead of using SELECT 1 in production code – then go ahead.

    If you haven’t seen performance issues with your code then don’t worry about it.

    I understand that you want something tangible to look at and prove or disprove this “theory”, but I don’t have time to put the code together and gather the statistics just to satisfy curiosity.

    I’ll leave that up to you to prove or disprove.

    It may be a good idea to post a comment on Connor’s site asking if he can prove or disprove this – as he is more intimate with the query processor than I am.

    Like

  10. Simon,

    Good comment. I think there is no real conclusion. As I said I always used SELECT 1, however this whole exercise is for fun and what everybody think about this matter.

    I like your way of thinking however neither theory is proven yet. I am also curious what other readers have to suggest.

    Regards,
    Pinal Dave ( http://www.SQLAuthority.com )

    Like

  11. Hi,

    Just want to ask, is there any select statement whereby it will call you to select the record start from second record then additional 10 records onwards?

    For example,

    User have to retrieve record 2 until record 12?
    If yes, can help?

    Thanks.

    Like

  12. Hi emily here is the answer for your question.

    SELECT rnum, employee_id
    FROM (
    SELECT rownum rnum, employee_id
    FROM employees
    WHERE ROWNUM < 100)
    WHERE rnum BETWEEN 2 AND 12;

    this query will retrieve output for
    record 2 until record 12. By use of inline view.

    Regards,
    Ravi
    Sony India
    ____________________
    Hi,

    Just want to ask, is there any select statement whereby it will call you to select the record start from second record then additional 10 records onwards?

    For example,

    User have to retrieve record 2 until record 12?
    If yes, can help?

    Thanks.

    Like

  13. SQL 2005

    I managed to get a glimpse using the following two queries. According the estimated query execution plans the ‘select *’ query has an extra step and the estimated row size is larger than ‘select 1′. Perhaps this is not proof but I think it is more evidence that ‘select 1′ is the better practice.

    select * from tempdb.sys.sysobjects where [name] = ”

    select 1 from tempdb.sys.sysobjects where [name] = ”

    Like

  14. Hi Pinal,

    I’m wondering what test conditions you used…

    The other day I had a performance issue with a proceudre that included a statement like

    IF NOT EXISTS (SELECT * FROM view WHERE condition)

    The view itself based on two tables withe several 10-thousand rows in one and several million in the other (as to my best knowledge properly joined and indexed using information from STATISTICS XML). The SELECT statement above would return approx. 1 million rows.

    When I changed it to SELECT 1, performance increased significantly.

    My explanation (at least to myself) goes a little more in direction of boolean logic:

    The NOT EXISTS statement becomes FALSE performing the following tasks:
    SELECT * -> grab all the data and if there are any drop them and return FALSE
    SELECT 1 -> try to grab the first row and if you find one single row return FALSE, no matter on how many rows fulfill the WHERE clause

    As far as I figured the TRUE statement takes the same time for both, * amd 1, since the resultset is NULL in both cases.

    Would you agree with my comment and, if not, retest your code with large tables?

    Regards

    Lutz

    Like

  15. Pingback: SQL SERVER - Guidelines and Coding Standards Part - 1 Journey to SQL Authority with Pinal Dave

  16. This shouldn’t be such a mysterious of contentious subject. SELECT * in an EXISTS condition is a SQL idiom. In this context * means “I don’t care about the columns,” not “fetch all columns.” It may or may not be optimized by the compiler depending on the dialect of SQL. Apparently T-SQL is smart in this regard but other RDBMSs may special-case SELECT * in subqueries.

    The Microsoft SQL Server documentation says this:

    “The select list of a subquery introduced with EXISTS, by convention, has an asterisk (*) instead of a single column name. The rules for a subquery introduced with EXISTS are the same as those for a standard select list, because a subquery introduced with EXISTS creates an existence test and returns TRUE or FALSE, instead of data.” (T-SQL reference, Subquery Rules).

    Anyone who knows SQL should recognize the idiom EXISTS(SELECT * …). The alternative EXISTS(SELECT 1 …) is obvious but not really idiomatic. Anything else is just showing off and making the SQL harder for the next person to understand.

    Like

  17. Lutz:

    EXISTS and NOT EXISTS do the same work: the db engine only has to find a single row that matches the WHERE clause in the subquery to satisfy the test. The syntax gives it aways: NOT EXISTS is the logical NOT of the EXISTS test.

    Depending on the subquery WHERE condition the existence of at least one matching row may be answered by scanning an index, or it may require a table scan. In either case as soon as a single matching row is found the result of EXISTS is known. The result of NOT EXISTS is simply the logical inverse of the result of EXISTS.

    In your example the subquery inside the EXISTS() doesn’t “return approx. 1 million rows.” It doesn’t return any rows — it returns a TRUE or FALSE. The db engine can give the result when it encounters the first row matching the WHERE clause. There is no result set to discard for the subquery.

    If you are seeing different results using SELECT 1 vs. SELECT * it’s probably due to SELECT * having done all the work and cached the indexes so SELECT 1 has less work to do.

    Like

  18. Simple issue and much discussions. And good one! Since in much cases we have to execute the EXISTS, using faster one [may be only fractions of second faster] matters. My preference is using SELECT 1, however. Thanks.

    Like

  19. I would call it a fairly major bug in the query engine if SELECT 1, and SELECT * were not optimized to do the same thing when the context is within the EXIST() statement. Certainly, there is no practical reason why you would visit any extra meta data at all if exists is just going to return a SQL bool..

    Like

  20. I would not call it a bug if they were not the same under the hood. The optimizer can only guess so much. It really depends on how the statements are written. The ones above are pretty simple and should optimize well, but not all are that easily second guessed as to their intent.

    Like

  21. Greg Jorgensen said:

    “EXISTS and NOT EXISTS do the same work”

    This is not necessarily true. Depending on the indexes being used (or not used), an EXISTS may be faster, because it can stop as soon as it finds one matching row. A NOT EXISTS may have to table scan the entire table (if there’s no index that helps the query).

    Like

  22. @Dan

    I still believe Greg Jorgensen was right on that one.

    To be sure that a value DOESN’T exist, the engine must do a full scan. To be sure that a value exists, finding the first is enough. And if there is proper index, the scan will use it.

    So, if you execute a WHERE X = 2 on a table completely filled of 1’s, a full scan will happen either way – doesn’t matter if you were using a EXISTS or a NOT EXISTS. The engine should work exactly the same way in both cases.

    Like

  23. I believe, select 1 is faster if the condition is checking the indexed columns.
    In that case, we don’t need to fetch the data from the FS, only read the index(which is the step used anyway) and if true, then output 1. This reduces the step to fetch data from underlying FS

    -R

    Like

  24. An advantage of using select 1 instead of select * is that you can do a simple find in your code base and any ‘select *’ will raise a red flag and a visit to the offending developer’s desk :-) A workaround is searching with a regular expression to exclude cases with ‘exists (‘ but my regular expression brain can’t figure this out in 2 minutes :-)

    Like

  25. A bit of history that we all know that SQL Server came from Sybase and that optimizers were not always as good as they are now. So these products have evoloved and been continually enhance over a period of over 20 years. These “Myths” started many moons ago. I clearly rmember having issues with SQL 6.5 doing EXISTS(SELECT * FROM …) and fixing these issues by using SELECT 1. The Optimizers now don’t have these issues but I still use SELECT 1 because vehemently dislike the use of SELECT * in any scenario.

    Like

  26. To add to this comment on the history of why select 1 is used. select * certainly used to have physical disk IO implications that take time reading the data where as selecting a constant didn’t. This probably isn’t the case now but I still use select 1

    Like

  27. I don’t believe this debate will ever end :)
    I have been asked more than once about the performance difference between EXISTS(SELECT * and EXISTS(SELECT 1. Since I deal almost exclusively with Microsoft SQL Server I have always pointed to the documentation Greg Jorgensen referred to (July 20, 2009), i.e. these days it comes down to personal preference. The whole point here is that the parser knows whether or not any real data is needed when the subquery is in an EXISTS, AFAIK SQL Server has treated EXISTS SELECT * as “SELECT anything” since at least version 7.0 (6.5 was the Sybase root/MS split). Any SQL database optimizer should be able to predict if retrieving meta-data is needed ot not.
    So for me there are 2 interpretations of SELECT *, if inside an EXISTS it is “SELECT anything”, outside is “SELECT ALL”, and any instance of “SELECT ALL” better have a good reason for its existance.

    BTW, I recently asked a newer co-worker why he used “SELECT 1″, his response was “I was taught that way”. The good news is he was also taught that “SELECT *” is bad, but unfortunately teaching good or bad with no explanation as to when or why is one of the reasons this thread exists.

    Like

  28. Pingback: SQL SERVER – Weekly Series – Memory Lane – #018 | SQL Server Journey with SQL Authority

  29. i just run select 1 from in DB2 it is showing 1,1,1,1,1… nothing more then that,even i tryd at SQL server too, same result , i dont knw how any y u guys comparing results ???

    Like

  30. Yes Sir for both the query’s it is obvious to get same performance execution plan as
    both query combination, at any scenario index or non index will going to perform table scan whole or indexed based on where clause and then next step would be to count(*) row of the table or just return 1. May be I am right or near to right Please Comment me sir
    Yes I find it fun and great observation which a Sql Developer or DBA does care of.
    I am greatest fan of urs, Ma Shaa Allah! keep it up the great work.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s