SQL SERVER – SELECT 1 vs SELECT * – An Interesting Observation

Many times I have seen the issue of SELECT 1 vs SELECT * discussed in terms of performance or readability while checking for the existence of rows in the table. I ran quick 4 tests about this observed that I am getting same result when used SELECT 1 and SELECT *. I think smart readers of this blog will come up the situation when SELECT 1 and SELECT * have a different execution plan when used to find the existence of rows.

Let us see 4 tests I ran and note their result is same and their Execution Plan take the same amount of resources (50% and 50%)

You can click on the images to see larger images.

Test 1: Whole Table SELECT

USE AdventureWorks
GO
IF EXISTS(
SELECT 1
FROM Production.Product)
SELECT 'SELECT 1'
GO
IF EXISTS(
SELECT *
FROM Production.Product)
SELECT 'SELECT *'
GO

Test 2: Condition WHERE on Indexed Column

USE AdventureWorks
GO
IF EXISTS(
SELECT 1
FROM Production.Product
WHERE Name = 'Adjustable Race')
SELECT 'SELECT 1'
GO
IF EXISTS(
SELECT *
FROM Production.Product
WHERE Name = 'Adjustable Race')
SELECT 'SELECT *'
GO

Test 3: Using Aggregate function COUNT

USE AdventureWorks
GO
IF (
SELECT 1
FROM Production.Product
WHERE Name = 'Adjustable Race') = 1
SELECT 'SELECT 1'
GO
IF (
SELECT COUNT(*)
FROM Production.Product
WHERE Name = 'Adjustable Race') = 1
SELECT 'SELECT *'
GO


Test 4: Using COUNT with search on non-indexed Column

USE AdventureWorks
GO
IF (
SELECT COUNT(1)
FROM Production.Product
WHERE SafetyStockLevel = '800') > 1
SELECT 'SELECT 1'
GO
IF (
SELECT COUNT(*)
FROM Production.Product
WHERE SafetyStockLevel = '800') > 1
SELECT 'SELECT *'
GO

Note: I have used single quotes around 800, which is numeric and I am aware of that fact. Above queries are for testing purpose only.

I have been using SELECT 1 instead of SELECT * when checking existence of rows. I would like to see what my readers have an opinion about this. Please have your opinion and make your comment here.

Reference: Pinal Dave (https://blog.sqlauthority.com)

,
Previous Post
SQL SERVER – Index Reorganize or Index Rebuild
Next Post
SQL SERVER – Dynamic Case Statement – FIX : ERROR 156 : Incorrect syntax near the keyword

Related Posts

51 Comments. Leave new

  • Allan Svelmoe Hansen
    February 26, 2008 11:59 am

    I read about this as well over on

    However my own testing, both when viewing execution plans and using Statistic timer to see compile/execution time, is that I can’t find consistent difference between using SELECT * or SELECT 1 when combined with an EXISTS clause/condition

    (even when using DROPCLEANBUFFERS and FREEPROCCACHE between each run, which as far as I understand should clear the cache and force a recompilation of my queries).

    I’m glad somebody else have taken this subject up as well, as I am strained for time to play myself at the moment – and it is one of those “myths” within SQL currently, that I’m having trouble finding out if is true or not.

    Reply
  • If you try this with a view that return columns from functions or subquieries

    ie

    select c1,c2,(select tc from tbl2 where tbl2.t1 =tbl.t1) as tc3
    from tbl

    the select 1 will not execute the (select tc from tbl2 where tbl2.t1 =tbl.t1 ) sub query. This is a significant performance gain my view runs 99% slower with select *

    Reply
  • The problem with SELECT * is it retrieves all the meta data associated to the object(s) in question BEFORE runtime. So during compilation of the query there is unneeded overhead associated to it – but at run time the query plans will be the same.

    Just because the query plans are the same does not mean that the queries perform the same – it is compile time that is taking the hit in this case, and not run time.

    So you’re doing yourself a compilation favor if you use SELECT 1 rather than SELECT *, even if you don’t see the gains in the execution plan – as I said they will probably be exactly the same. Just keep in mind that meta data is being queried and returned, then discarded because it is unneeded to get the results of you query. Why thrash the system tables when you don’t need to?

    When you go to the store to get a bottle of coke, you don’t buy all the coke products in the store and throw out those you didn’t want – it’s the same thing here. Get what you need from the beginning – it costs less.

    Reply
  • One Great Article. Pinal tum great ho!

    Reply
  • I don’t think there is any difference, as long as the SELECT 1/* is inside EXISTS, which really doesn’t return any rows – it just returns boolean as soon as condition of the WHERE is checked.

    I’m quite sure that the SQL Server Query Optimizer is smart enough not to search for the unneeded meta data in the case of EXISTS.

    I agree that in all the other situations SELECT * shouldn’t be used for the reasons Simon mentioned. Also, index usage wouldn’t be optimal etc.

    For me EXISTS (SELECT * ..) is the only place where I allow myself to write SELECT * in production code ;)

    Reply
  • @simon worth:
    can you provide some source or example for this claim? i’d be very interesetd to see.

    Reply
  • Reply
  • Allan Svelmoe Hansen
    February 27, 2008 12:03 pm

    @simon worth:
    By running statistics time on I’ve not seen any consistent indication that at compile time the query will collect all the metadata and thus have a higher compile time them then a SELECT 1. Some times it was faster, other times it was not.

    Without looking into the engine I would also seriously think it would be bad implementation to retrieve any sort of meta data when the SELECT * is within a conditional clause, like EXISTS. The engine would be able to see from the syntax alone, that there is no usage for the specific data and thus no need to look up any additional metadata, other then testing whether the statement evaluates true or not.

    I’d be very interested if you had some code which could show this behaviour. Or some links which have shown this to be true. Because I can’t see it myself, neither logical nor running statistic on the queries.

    Reply
  • It seems that this is primarily a semantic argument in terms of runtime performance. Personally, I’ve always used SELECT 1 because I wasn’t sure about the implications of the alternative.

    If it comes down to little more than personal preference, I’m going to continue using SELECT 1 because it reinforces the idea that SELECT * is (usually) bad.

    Reply
  • Thomas Freudenberg
    February 28, 2008 2:26 am

    To check the existence of rows you can also try this statement:

    IF EXISTS(SELECT NULL as [EMPTY] FROM Production.Product)

    Reply
  • simon worth
    March 5, 2008 7:35 pm

    If you want to read more about the meta data collection on objects when the query is executed – read the article that Allan Svelmoe Hansen posted in the first comment. Conor used to work (I don’t think he still does) for Microsoft on the Query Processor team. If anyone would know the inside of the query processor it’s him. In that blog post he states

    “The * will be expanded to some potentially big column list and then it will be determined that the semantics of the EXISTS does not require any of those columns, so basically all of them can be removed.

    “SELECT 1″ will avoid having to examine any unneeded metadata for that table during query compilation.”

    Source :

    Reply
  • Simon,

    We can not take anybodies word for it. (With due respect to Conor – I know he is the man). Is there any proof for this?

    -Guy

    Reply
  • simon worth
    March 5, 2008 7:54 pm

    Honestly I don’t see what the importance of this subject is.

    People have been saying for a long time that SELECT * in an exists condition has unneeded overhead – and there are several blog posts out there to back up that claim.

    If you really really want to use SELECT * instead of using SELECT 1 in production code – then go ahead.

    If you haven’t seen performance issues with your code then don’t worry about it.

    I understand that you want something tangible to look at and prove or disprove this “theory”, but I don’t have time to put the code together and gather the statistics just to satisfy curiosity.

    I’ll leave that up to you to prove or disprove.

    It may be a good idea to post a comment on Connor’s site asking if he can prove or disprove this – as he is more intimate with the query processor than I am.

    Reply
  • Simon,

    Good comment. I think there is no real conclusion. As I said I always used SELECT 1, however this whole exercise is for fun and what everybody think about this matter.

    I like your way of thinking however neither theory is proven yet. I am also curious what other readers have to suggest.

    Regards,
    Pinal Dave ( https://blog.sqlauthority.com/ )

    Reply
  • Hi,

    Just want to ask, is there any select statement whereby it will call you to select the record start from second record then additional 10 records onwards?

    For example,

    User have to retrieve record 2 until record 12?
    If yes, can help?

    Thanks.

    Reply
  • Hi emily here is the answer for your question.

    SELECT rnum, employee_id
    FROM (
    SELECT rownum rnum, employee_id
    FROM employees
    WHERE ROWNUM < 100)
    WHERE rnum BETWEEN 2 AND 12;

    this query will retrieve output for
    record 2 until record 12. By use of inline view.

    Regards,
    Ravi
    Sony India
    ____________________
    Hi,

    Just want to ask, is there any select statement whereby it will call you to select the record start from second record then additional 10 records onwards?

    For example,

    User have to retrieve record 2 until record 12?
    If yes, can help?

    Thanks.

    Reply
  • Nice observation

    Reply
  • SQL 2005

    I managed to get a glimpse using the following two queries. According the estimated query execution plans the ‘select *’ query has an extra step and the estimated row size is larger than ‘select 1’. Perhaps this is not proof but I think it is more evidence that ‘select 1’ is the better practice.

    select * from tempdb.sys.sysobjects where [name] = ”

    select 1 from tempdb.sys.sysobjects where [name] = ”

    Reply
  • sorry, on that last post, i tried to include a string that inadvertenly looked like html and was stripped out. just use any user-defined table for the [name] = portion of the query to duplicate my findings.

    Reply
  • Hi Pinal,

    I’m wondering what test conditions you used…

    The other day I had a performance issue with a proceudre that included a statement like

    IF NOT EXISTS (SELECT * FROM view WHERE condition)

    The view itself based on two tables withe several 10-thousand rows in one and several million in the other (as to my best knowledge properly joined and indexed using information from STATISTICS XML). The SELECT statement above would return approx. 1 million rows.

    When I changed it to SELECT 1, performance increased significantly.

    My explanation (at least to myself) goes a little more in direction of boolean logic:

    The NOT EXISTS statement becomes FALSE performing the following tasks:
    SELECT * -> grab all the data and if there are any drop them and return FALSE
    SELECT 1 -> try to grab the first row and if you find one single row return FALSE, no matter on how many rows fulfill the WHERE clause

    As far as I figured the TRUE statement takes the same time for both, * amd 1, since the resultset is NULL in both cases.

    Would you agree with my comment and, if not, retest your code with large tables?

    Regards

    Lutz

    Reply

Leave a Reply Cancel reply

Exit mobile version