This question is asked many times to me. What is difference between DISTINCT and GROUP BY?
A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster.
Example of DISTINCT:
SELECT DISTINCT Employee, Rank
FROM Employees
Example of GROUP BY:
SELECT Employee, Rank
FROM Employees
GROUP BY Employee, Rank
Example of GROUP BY with aggregate function:
SELECT Employee, Rank, COUNT(*) EmployeeCount
FROM Employees
GROUP BY Employee, Rank
Reference : Pinal Dave (https://blog.sqlauthority.com)
97 Comments. Leave new
Very simple and clear explanation, very easy to understand
hay can u tell that how can i found average of colume valu in crystal reprot……not use cross tab only by queryes…. of many
In Crystal Reports, make use of Aggregate function. There is an option to have AVG
Create New Formula .In that give avg(Column Name)
use that formula in u r crystal report
hey…
i have a performance issue with me.
in one query and corresponding sub queries they have used so many distinct clauses . so suggest me the best replacement
to the distinct to improve the performance.
I have view reteriving records from multiple tables using join.. I want DISTINCT records but for speific field in Jet SQL you can do it by DISTINCT ( Employee.EmployeeId), …How in SQL server… thanks
Hi Pinal
I m visit this site for long period really i felt this site is very nice and explanation is very good.
SELECT DISTINCT Purchase.SessionId, UserInfo.UserId, UserInfo.UserName, Purchase.grandTotal, Purchase.PurchaseDate, Purchase.Status
FROM UserInfo INNER JOIN Purchase ON UserInfo.UserId=Purchase.Uid;
but this doesn’t generate correct result ……………… plz reply whats problem with that
Hello Pinal:
There is a table t_test with columns
[id, name, code, description].
I can write a query using MySQL as
SELECT
*
FROM t_test
GROUP BY Code
What would be equivalent sql server query for this.
Thanks.
Note that it works in Mysql. In SQL Server you should always specify the column names in the SELECT statement
It should be
SELECT code FROM t_test
GROUP BY code
Hi You can do it as below:
select coloumn1,coloumn2,coloumn3 from table
group by coloumn1,coloumn2,coloumn3
yes it is reply
hh, I can’t really say what the query should be without knowing what your expected result is.
Having a sample of a few typical data rows from the UserInfo and Purchase tables might also be neccessary.
However, “SELECT DISTINCT” performs a hash match on the selected fields, and omits any records producing a duplicate hash. In other words, the query you used will return any record where the returned column values, taken together, are different from any other record in the result set.
In general, when using “distinct”, more columns selected = more rows returned. If you need to select all of these columns into the result set, but are only concerned that a few of them be unique, then you need a different construct. Two possibilities are:
1. Use “GROUP BY” to eliminate duplicates on the desired columns, and then apply aggregate functions (MIN, MAX, AVG, etc) to the remaining columns.
2. Use subqueries or self-joins to limit the records if you need selected columns to be unique and don’t want to aggregate the other columns.
I have an issue with both DISTINCT and GROUP BY.
The issue is that using either one, the results comes back ordered as if using order by.
I need the unique results in the order they are in the database. How can I do this? Thanks
Note that you should not rely upon the natual ordering of the SQL Server. If you use versions earlier than 2000, it defaults the result by ORDER. But it is not gauranteed that it will work always. In Versions 2005 and above, there is no default order when Distinct is used
i have a question regarding search criteria
Select lastname,firstname from TblResourceProfile
where @crit like in(lastname,firstname)
dont know where to put the wild card ‘%’
i
Select lastname,firstname from TblResourceProfile
where (lastname like ‘%@crit%’) or (firstname like ‘%@crit%’)
thanks by the way….;)
Select lastname,firstname from TblResourceProfile
where @crit like in(lastname,firstname)
dont know where to put the wild card ‘%’
can this be possible?
It should be
Select lastname,firstname from TblResourceProfile
where @crit like lastname+’%’ or @crit like firstname+’%’
There is a third approach if you have a master and detail table – WHERE EXISTS. For example, if you have an OrderHeader table and an OrderItem table that have a one to many relationship on OrderHeaderId, you can decide unique orders in OrderItem one of 2 ways
SELECT DISTINCT orderheaderid from orderitem
SELECT orderheaderid FROM orderheader oh WHERE EXISTS (SELECT 1 FROM OrderItem WHERE OrderHeaderId=oh.OrderHeaderId)
If these queries are run together, the “DISTINCT” query takes about 2/3 total query time, with WHERE EXISTS taken up a 1/3.
(Ahh but why dont you just select from OrderHeader in this case. Well that would not omit orders that have no item records – if thats how you represent cancelled orders)
In production code, always try to write SQL queries that produce unique results WITHOUT DISTINCT clauses. If performance is what you are after and you have one to many table relationships, use WHERE EXISTS.
We have published an article of SQL Performance Do’s and Dont’s if thats of help to anyone.
Steve Hatchard
Director
Mattched IT Ltd
Is there a way to execute a DISTINCT query on a single field while the SELECT statement contains more fields. For example this statement:
SELECT DISTINCT EmployeeID, EmployeeName, ContractDate, CurrentTitlle FROM EmployeeTable
I am only interested to fetch DISTINCT reocrds based on the EmployeeID. Please note that more than one occurence of each employee’s record exists in this table. Therefore getting DISTINCT on all fields is not possible.
If DISTINCT cannot solve this problem, what is other best approach in achieve this?
Tariq Changgez
Your best answer is to have two tables – one for unique employee data EmployeeId, EmployeeName, CurrentTitle etc.. and one for multiple row data – EmployeeId, ContractDate. It is usually best practice to “normalise” data like this.
This off topic and not really an issue for DISTINCT.
Steve Hatchard
Director
Mattched IT Ltd
When request is very simple, Distinct can be used else always use Group By. really faster.
hi pinal,
i’m struck with an issue related to distinct clause.
i’ve fired a query in which i’ve used distict clause. As per that i’m getting distinct rows but they are repeatative in nature.
e.g. if i’m getting 3 distinct rows from my query; i’m getting 1st row many hundreds of time, followed by 2nd and 3rd row in the same manner. How to suppress these repeated rows and how to only three distinct rows as a output?
Thanx in advance.
i like pinal dave blogs
Hi Pinal,
I have a situation, I want to count the amount of time a similar name occurs in a table.
E.g.
Table Buses,
Starmart
Starmart
Starmart
Starmart
Starmart
Damai
Damai
Damai
Damai
Damai
Damai
Damai
Damai
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
How do i actually type out a sql script to group and return the count for each name.
This is what i used so far
select CurrentOperator, COUNT(*) CurrentOperator from EZBusDB.dbo.Buses
GROUP BY CurrentOperator,
I am getting this error:
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near ‘,’.
How i go about it?
Thank you!
“select CurrentOperator, COUNT(*) CurrentOperator from EZBusDB.dbo.Buses
GROUP BY CurrentOperator”
Delete Comma(,) At the End of Your Command