SQL SERVER – SELECT 1 vs SELECT * – An Interesting Observation

February 26, 2008

Many times I have seen the issue of SELECT 1 vs SELECT * discussed in terms of performance or readability while checking for the existence of rows in the table. I ran quick 4 tests about this observed that I am getting same result when used SELECT 1 and SELECT *. I think smart readers of this blog will come up the situation when SELECT 1 and SELECT * have a different execution plan when used to find the existence of rows.

Let us see 4 tests I ran and note their result is same and their Execution Plan take the same amount of resources (50% and 50%)

You can click on the images to see larger images.

Test 1: Whole Table SELECT

USE AdventureWorks
GO
IF EXISTS(
SELECT 1
FROM Production.Product)
SELECT 'SELECT 1'
GO
IF EXISTS(
SELECT *
FROM Production.Product)
SELECT 'SELECT *'
GO

Test 2: Condition WHERE on Indexed Column

USE AdventureWorks
GO
IF EXISTS(
SELECT 1
FROM Production.Product
WHERE Name = 'Adjustable Race')
SELECT 'SELECT 1'
GO
IF EXISTS(
SELECT *
FROM Production.Product
WHERE Name = 'Adjustable Race')
SELECT 'SELECT *'
GO

Test 3: Using Aggregate function COUNT

USE AdventureWorks
GO
IF (
SELECT 1
FROM Production.Product
WHERE Name = 'Adjustable Race') = 1
SELECT 'SELECT 1'
GO
IF (
SELECT COUNT(*)
FROM Production.Product
WHERE Name = 'Adjustable Race') = 1
SELECT 'SELECT *'
GO

Test 4: Using COUNT with search on non-indexed Column

USE AdventureWorks
GO
IF (
SELECT COUNT(1)
FROM Production.Product
WHERE SafetyStockLevel = '800') &amp;gt; 1
SELECT 'SELECT 1'
GO
IF (
SELECT COUNT(*)
FROM Production.Product
WHERE SafetyStockLevel = '800') &amp;gt; 1
SELECT 'SELECT *'
GO

Note: I have used single quotes around 800, which is numeric and I am aware of that fact. Above queries are for testing purpose only.

I have been using SELECT 1 instead of SELECT * when checking existence of rows. I would like to see what my readers have an opinion about this. Please have your opinion and make your comment here.

Reference: Pinal Dave (https://blog.sqlauthority.com)

SQL Scripts, SQL Server

SQL SERVER – Index Reorganize or Index Rebuild

SQL SERVER – Dynamic Case Statement – FIX : ERROR 156 : Incorrect syntax near the keyword

51 Comments. Leave new

Allan Svelmoe Hansen
February 26, 2008 11:59 am

I read about this as well over on

However my own testing, both when viewing execution plans and using Statistic timer to see compile/execution time, is that I can’t find consistent difference between using SELECT * or SELECT 1 when combined with an EXISTS clause/condition

(even when using DROPCLEANBUFFERS and FREEPROCCACHE between each run, which as far as I understand should clear the cache and force a recompilation of my queries).

I’m glad somebody else have taken this subject up as well, as I am strained for time to play myself at the moment – and it is one of those “myths” within SQL currently, that I’m having trouble finding out if is true or not.

Reply
Kris
February 26, 2008 6:28 pm

If you try this with a view that return columns from functions or subquieries

ie

select c1,c2,(select tc from tbl2 where tbl2.t1 =tbl.t1) as tc3
from tbl

the select 1 will not execute the (select tc from tbl2 where tbl2.t1 =tbl.t1 ) sub query. This is a significant performance gain my view runs 99% slower with select *

Reply
simon worth
February 26, 2008 8:01 pm

The problem with SELECT * is it retrieves all the meta data associated to the object(s) in question BEFORE runtime. So during compilation of the query there is unneeded overhead associated to it – but at run time the query plans will be the same.

Just because the query plans are the same does not mean that the queries perform the same – it is compile time that is taking the hit in this case, and not run time.

So you’re doing yourself a compilation favor if you use SELECT 1 rather than SELECT *, even if you don’t see the gains in the execution plan – as I said they will probably be exactly the same. Just keep in mind that meta data is being queried and returned, then discarded because it is unneeded to get the results of you query. Why thrash the system tables when you don’t need to?

When you go to the store to get a bottle of coke, you don’t buy all the coke products in the store and throw out those you didn’t want – it’s the same thing here. Get what you need from the beginning – it costs less.

Reply
Manik Bod
February 26, 2008 10:17 pm

One Great Article. Pinal tum great ho!

Reply
jemm
February 27, 2008 2:20 am

I don’t think there is any difference, as long as the SELECT 1/* is inside EXISTS, which really doesn’t return any rows – it just returns boolean as soon as condition of the WHERE is checked.

I’m quite sure that the SQL Server Query Optimizer is smart enough not to search for the unneeded meta data in the case of EXISTS.

I agree that in all the other situations SELECT * shouldn’t be used for the reasons Simon mentioned. Also, index usage wouldn’t be optimal etc.

For me EXISTS (SELECT * ..) is the only place where I allow myself to write SELECT * in production code ;)

Reply
Mladen
February 27, 2008 4:24 am

@simon worth:
can you provide some source or example for this claim? i’d be very interesetd to see.

Reply
Allan Svelmoe Hansen
February 27, 2008 12:03 pm

@simon worth:
By running statistics time on I’ve not seen any consistent indication that at compile time the query will collect all the metadata and thus have a higher compile time them then a SELECT 1. Some times it was faster, other times it was not.

Without looking into the engine I would also seriously think it would be bad implementation to retrieve any sort of meta data when the SELECT * is within a conditional clause, like EXISTS. The engine would be able to see from the syntax alone, that there is no usage for the specific data and thus no need to look up any additional metadata, other then testing whether the statement evaluates true or not.

I’d be very interested if you had some code which could show this behaviour. Or some links which have shown this to be true. Because I can’t see it myself, neither logical nor running statistic on the queries.

Reply
Tim B
February 27, 2008 11:39 pm

It seems that this is primarily a semantic argument in terms of runtime performance. Personally, I’ve always used SELECT 1 because I wasn’t sure about the implications of the alternative.

If it comes down to little more than personal preference, I’m going to continue using SELECT 1 because it reinforces the idea that SELECT * is (usually) bad.

Reply
Thomas Freudenberg
February 28, 2008 2:26 am

To check the existence of rows you can also try this statement:

IF EXISTS(SELECT NULL as [EMPTY] FROM Production.Product)

Reply
simon worth
March 5, 2008 7:35 pm

If you want to read more about the meta data collection on objects when the query is executed – read the article that Allan Svelmoe Hansen posted in the first comment. Conor used to work (I don’t think he still does) for Microsoft on the Query Processor team. If anyone would know the inside of the query processor it’s him. In that blog post he states

“The * will be expanded to some potentially big column list and then it will be determined that the semantics of the EXISTS does not require any of those columns, so basically all of them can be removed.

“SELECT 1″ will avoid having to examine any unneeded metadata for that table during query compilation.”

Source :

Reply
Guy Pravin
March 5, 2008 7:45 pm

Simon,

We can not take anybodies word for it. (With due respect to Conor – I know he is the man). Is there any proof for this?

-Guy

Reply
simon worth
March 5, 2008 7:54 pm

Honestly I don’t see what the importance of this subject is.

People have been saying for a long time that SELECT * in an exists condition has unneeded overhead – and there are several blog posts out there to back up that claim.

If you really really want to use SELECT * instead of using SELECT 1 in production code – then go ahead.

If you haven’t seen performance issues with your code then don’t worry about it.

I understand that you want something tangible to look at and prove or disprove this “theory”, but I don’t have time to put the code together and gather the statistics just to satisfy curiosity.

I’ll leave that up to you to prove or disprove.

It may be a good idea to post a comment on Connor’s site asking if he can prove or disprove this – as he is more intimate with the query processor than I am.

Reply
emily
May 2, 2008 1:27 pm

Hi,

Just want to ask, is there any select statement whereby it will call you to select the record start from second record then additional 10 records onwards?

For example,

User have to retrieve record 2 until record 12?
If yes, can help?

Thanks.

Reply
Ravi
June 25, 2008 12:09 pm

Hi emily here is the answer for your question.

SELECT rnum, employee_id
FROM (
SELECT rownum rnum, employee_id
FROM employees
WHERE ROWNUM < 100)
WHERE rnum BETWEEN 2 AND 12;

this query will retrieve output for
record 2 until record 12. By use of inline view.

Regards,
Ravi
Sony India
____________________
Hi,

Just want to ask, is there any select statement whereby it will call you to select the record start from second record then additional 10 records onwards?

For example,

User have to retrieve record 2 until record 12?
If yes, can help?

Thanks.

Reply
G2
July 11, 2008 1:52 am

Nice observation

Reply
Bryan Stauffer
September 19, 2008 11:17 pm

SQL 2005

I managed to get a glimpse using the following two queries. According the estimated query execution plans the ‘select *’ query has an extra step and the estimated row size is larger than ‘select 1’. Perhaps this is not proof but I think it is more evidence that ‘select 1’ is the better practice.

select * from tempdb.sys.sysobjects where [name] = ”

select 1 from tempdb.sys.sysobjects where [name] = ”

Reply
Bryan Stauffer
September 19, 2008 11:19 pm

sorry, on that last post, i tried to include a string that inadvertenly looked like html and was stripped out. just use any user-defined table for the [name] = portion of the query to duplicate my findings.

Reply
Lutz Mueller
November 21, 2008 8:54 pm

Hi Pinal,

I’m wondering what test conditions you used…

The other day I had a performance issue with a proceudre that included a statement like

IF NOT EXISTS (SELECT * FROM view WHERE condition)

The view itself based on two tables withe several 10-thousand rows in one and several million in the other (as to my best knowledge properly joined and indexed using information from STATISTICS XML). The SELECT statement above would return approx. 1 million rows.

When I changed it to SELECT 1, performance increased significantly.

My explanation (at least to myself) goes a little more in direction of boolean logic:

The NOT EXISTS statement becomes FALSE performing the following tasks:
SELECT * -> grab all the data and if there are any drop them and return FALSE
SELECT 1 -> try to grab the first row and if you find one single row return FALSE, no matter on how many rows fulfill the WHERE clause

As far as I figured the TRUE statement takes the same time for both, * amd 1, since the resultset is NULL in both cases.

Would you agree with my comment and, if not, retest your code with large tables?

Regards

Lutz

Reply
Rao K
December 5, 2008 11:09 pm

What is the equallent in SQL Server for

select * from table1 a
where (a.x, a.y) in (
select b.x, min(b.y)
from table1 b
group by b.x
)

Reply
rohit
February 7, 2009 7:07 pm

how do we get back the data that we ve deleted from a table??

Reply
- Madhivanan
  March 15, 2010 12:43 pm
  
  If you used Transaction, you can rollback it
  Otherwise restore the latest backup you have
  
  Reply