SQL SERVER – Difference between DISTINCT and GROUP BY – Distinct vs Group By

March 29, 2007

SQL, SQL Performance, SQL Server, SQL Tips and Tricks

This question is asked many times to me. What is difference between DISTINCT and GROUP BY?

A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster.

Example of DISTINCT:
SELECT DISTINCT Employee, Rank FROM Employees

Example of GROUP BY:
SELECT Employee, Rank FROM Employees GROUP BY Employee, Rank

Example of GROUP BY with aggregate function:
SELECT Employee, Rank, COUNT(*) EmployeeCount FROM Employees GROUP BY Employee, Rank

Reference : Pinal Dave (https://blog.sqlauthority.com)

SQL Function, SQL Joins, SQL Scripts

SQL SERVER – Fix : Error 8101 An explicit value for the identity column in table can only be specified when a column list is used and IDENTITY_INSERT is ON

SQL SERVER – Index Seek Vs. Index Scan (Table Scan)

97 Comments. Leave new

Amit B
May 14, 2007 4:21 am
Very simple and clear explanation, very easy to understand
Reply
hasan
August 30, 2007 2:35 am
hay can u tell that how can i found average of colume valu in crystal reprot……not use cross tab only by queryes…. of many
Reply
- Madhivanan
  February 15, 2010 6:40 pm
  In Crystal Reports, make use of Aggregate function. There is an option to have AVG
  Reply
- sambashivareddy
  March 26, 2010 7:20 pm
  Create New Formula .In that give avg(Column Name)
  use that formula in u r crystal report
  Reply
raghuram
February 19, 2008 3:31 pm
hey…
i have a performance issue with me.
in one query and corresponding sub queries they have used so many distinct clauses . so suggest me the best replacement
to the distinct to improve the performance.
Reply
Haider
March 3, 2008 1:12 pm
I have view reteriving records from multiple tables using join.. I want DISTINCT records but for speific field in Jet SQL you can do it by DISTINCT ( Employee.EmployeeId), …How in SQL server… thanks
Reply
Rahul Sharma
April 23, 2008 11:43 am
Hi Pinal
I m visit this site for long period really i felt this site is very nice and explanation is very good.
Reply
hh
May 1, 2008 4:25 pm
SELECT DISTINCT Purchase.SessionId, UserInfo.UserId, UserInfo.UserName, Purchase.grandTotal, Purchase.PurchaseDate, Purchase.Status
FROM UserInfo INNER JOIN Purchase ON UserInfo.UserId=Purchase.Uid;
but this doesn’t generate correct result ……………… plz reply whats problem with that
Reply
FSL
May 8, 2008 10:30 am
Hello Pinal:
There is a table t_test with columns
[id, name, code, description].
I can write a query using MySQL as
SELECT
*
FROM t_test
GROUP BY Code
What would be equivalent sql server query for this.
Thanks.
Reply
- Madhivanan
  February 15, 2010 6:44 pm
  Note that it works in Mysql. In SQL Server you should always specify the column names in the SELECT statement
  It should be
  SELECT code FROM t_test
  GROUP BY code
  Reply
  - Anup Ghanshala
    May 2, 2012 7:21 am
    Hi You can do it as below:
    select coloumn1,coloumn2,coloumn3 from table
    group by coloumn1,coloumn2,coloumn3
  - sac
    July 3, 2013 11:18 am
    yes it is reply
bdn
June 4, 2008 3:35 am
hh, I can’t really say what the query should be without knowing what your expected result is.
Having a sample of a few typical data rows from the UserInfo and Purchase tables might also be neccessary.
However, “SELECT DISTINCT” performs a hash match on the selected fields, and omits any records producing a duplicate hash. In other words, the query you used will return any record where the returned column values, taken together, are different from any other record in the result set.
In general, when using “distinct”, more columns selected = more rows returned. If you need to select all of these columns into the result set, but are only concerned that a few of them be unique, then you need a different construct. Two possibilities are:
1. Use “GROUP BY” to eliminate duplicates on the desired columns, and then apply aggregate functions (MIN, MAX, AVG, etc) to the remaining columns.
2. Use subqueries or self-joins to limit the records if you need selected columns to be unique and don’t want to aggregate the other columns.
Reply
Stewy
July 16, 2008 2:30 am
I have an issue with both DISTINCT and GROUP BY.
The issue is that using either one, the results comes back ordered as if using order by.
I need the unique results in the order they are in the database. How can I do this? Thanks
Reply
- Madhivanan
  February 15, 2010 6:47 pm
  Note that you should not rely upon the natual ordering of the SQL Server. If you use versions earlier than 2000, it defaults the result by ORDER. But it is not gauranteed that it will work always. In Versions 2005 and above, there is no default order when Distinct is used
  Reply
jhon
July 22, 2008 8:27 am
i have a question regarding search criteria
Select lastname,firstname from TblResourceProfile
where @crit like in(lastname,firstname)
dont know where to put the wild card ‘%’
i
Reply
- samba shiva reddy
  February 27, 2010 7:11 pm
  Select lastname,firstname from TblResourceProfile
  where (lastname like ‘%@crit%’) or (firstname like ‘%@crit%’)
  Reply
jhon
July 22, 2008 8:32 am
thanks by the way….;)
Reply
jhon
July 22, 2008 9:15 am
Select lastname,firstname from TblResourceProfile
where @crit like in(lastname,firstname)
dont know where to put the wild card ‘%’
can this be possible?
Reply
- Madhivanan
  February 15, 2010 6:50 pm
  It should be
  Select lastname,firstname from TblResourceProfile
  where @crit like lastname+’%’ or @crit like firstname+’%’
  Reply
Steve Hatchard
November 22, 2008 12:26 am
There is a third approach if you have a master and detail table – WHERE EXISTS. For example, if you have an OrderHeader table and an OrderItem table that have a one to many relationship on OrderHeaderId, you can decide unique orders in OrderItem one of 2 ways
SELECT DISTINCT orderheaderid from orderitem
SELECT orderheaderid FROM orderheader oh WHERE EXISTS (SELECT 1 FROM OrderItem WHERE OrderHeaderId=oh.OrderHeaderId)
If these queries are run together, the “DISTINCT” query takes about 2/3 total query time, with WHERE EXISTS taken up a 1/3.
(Ahh but why dont you just select from OrderHeader in this case. Well that would not omit orders that have no item records – if thats how you represent cancelled orders)
In production code, always try to write SQL queries that produce unique results WITHOUT DISTINCT clauses. If performance is what you are after and you have one to many table relationships, use WHERE EXISTS.
We have published an article of SQL Performance Do’s and Dont’s if thats of help to anyone.
Steve Hatchard
Director
Mattched IT Ltd
Reply
Tariq Changgez
November 24, 2008 5:07 pm
Is there a way to execute a DISTINCT query on a single field while the SELECT statement contains more fields. For example this statement:
SELECT DISTINCT EmployeeID, EmployeeName, ContractDate, CurrentTitlle FROM EmployeeTable
I am only interested to fetch DISTINCT reocrds based on the EmployeeID. Please note that more than one occurence of each employee’s record exists in this table. Therefore getting DISTINCT on all fields is not possible.
If DISTINCT cannot solve this problem, what is other best approach in achieve this?
Tariq Changgez
Reply
Steve Hatchard
November 24, 2008 11:48 pm
Your best answer is to have two tables – one for unique employee data EmployeeId, EmployeeName, CurrentTitle etc.. and one for multiple row data – EmployeeId, ContractDate. It is usually best practice to “normalise” data like this.
This off topic and not really an issue for DISTINCT.
Steve Hatchard
Director
Mattched IT Ltd
Reply
FeelTheWay
December 22, 2008 8:39 pm
When request is very simple, Distinct can be used else always use Group By. really faster.
Reply
Sagar
January 28, 2009 7:23 pm
hi pinal,
i’m struck with an issue related to distinct clause.
i’ve fired a query in which i’ve used distict clause. As per that i’m getting distinct rows but they are repeatative in nature.
e.g. if i’m getting 3 distinct rows from my query; i’m getting 1st row many hundreds of time, followed by 2nd and 3rd row in the same manner. How to suppress these repeated rows and how to only three distinct rows as a output?
Thanx in advance.
Reply
madhavi
February 9, 2009 5:44 pm
i like pinal dave blogs
Reply
Shwaria
March 18, 2009 9:51 am
Hi Pinal,
I have a situation, I want to count the amount of time a similar name occurs in a table.
E.g.
Table Buses,
Starmart
Starmart
Starmart
Starmart
Starmart
Damai
Damai
Damai
Damai
Damai
Damai
Damai
Damai
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
How do i actually type out a sql script to group and return the count for each name.
This is what i used so far
select CurrentOperator, COUNT(*) CurrentOperator from EZBusDB.dbo.Buses
GROUP BY CurrentOperator,
I am getting this error:
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near ‘,’.
How i go about it?
Thank you!
Reply
Saranda
March 18, 2009 3:26 pm
“select CurrentOperator, COUNT(*) CurrentOperator from EZBusDB.dbo.Buses
GROUP BY CurrentOperator”
Delete Comma(,) At the End of Your Command
Reply