This question is asked many times to me. What is difference between DISTINCT and GROUP BY?
A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster.
Example of DISTINCT:
SELECT DISTINCT Employee, Rank
FROM Employees
Example of GROUP BY:
SELECT Employee, Rank
FROM Employees
GROUP BY Employee, Rank
Example of GROUP BY with aggregate function:
SELECT Employee, Rank, COUNT(*) EmployeeCount
FROM Employees
GROUP BY Employee, Rank
Reference : Pinal Dave (http://blog.SQLAuthority.com)












Very simple and clear explanation, very easy to understand
hay can u tell that how can i found average of colume valu in crystal reprot……not use cross tab only by queryes…. of many
In Crystal Reports, make use of Aggregate function. There is an option to have AVG
Create New Formula .In that give avg(Column Name)
use that formula in u r crystal report
hey…
i have a performance issue with me.
in one query and corresponding sub queries they have used so many distinct clauses . so suggest me the best replacement
to the distinct to improve the performance.
I have view reteriving records from multiple tables using join.. I want DISTINCT records but for speific field in Jet SQL you can do it by DISTINCT ( Employee.EmployeeId), …How in SQL server… thanks
Hi Pinal
I m visit this site for long period really i felt this site is very nice and explanation is very good.
SELECT DISTINCT Purchase.SessionId, UserInfo.UserId, UserInfo.UserName, Purchase.grandTotal, Purchase.PurchaseDate, Purchase.Status
FROM UserInfo INNER JOIN Purchase ON UserInfo.UserId=Purchase.Uid;
but this doesn’t generate correct result ……………… plz reply whats problem with that
Hello Pinal:
There is a table t_test with columns
[id, name, code, description].
I can write a query using MySQL as
SELECT
*
FROM t_test
GROUP BY Code
What would be equivalent sql server query for this.
Thanks.
Note that it works in Mysql. In SQL Server you should always specify the column names in the SELECT statement
It should be
SELECT code FROM t_test
GROUP BY code
Hi You can do it as below:
select coloumn1,coloumn2,coloumn3 from table
group by coloumn1,coloumn2,coloumn3
hh, I can’t really say what the query should be without knowing what your expected result is.
Having a sample of a few typical data rows from the UserInfo and Purchase tables might also be neccessary.
However, “SELECT DISTINCT” performs a hash match on the selected fields, and omits any records producing a duplicate hash. In other words, the query you used will return any record where the returned column values, taken together, are different from any other record in the result set.
In general, when using “distinct”, more columns selected = more rows returned. If you need to select all of these columns into the result set, but are only concerned that a few of them be unique, then you need a different construct. Two possibilities are:
1. Use “GROUP BY” to eliminate duplicates on the desired columns, and then apply aggregate functions (MIN, MAX, AVG, etc) to the remaining columns.
2. Use subqueries or self-joins to limit the records if you need selected columns to be unique and don’t want to aggregate the other columns.
I have an issue with both DISTINCT and GROUP BY.
The issue is that using either one, the results comes back ordered as if using order by.
I need the unique results in the order they are in the database. How can I do this? Thanks
Note that you should not rely upon the natual ordering of the SQL Server. If you use versions earlier than 2000, it defaults the result by ORDER. But it is not gauranteed that it will work always. In Versions 2005 and above, there is no default order when Distinct is used
i have a question regarding search criteria
Select lastname,firstname from TblResourceProfile
where @crit like in(lastname,firstname)
dont know where to put the wild card ‘%’
i
Select lastname,firstname from TblResourceProfile
where (lastname like ‘%@crit%’) or (firstname like ‘%@crit%’)
thanks by the way….;)
Select lastname,firstname from TblResourceProfile
where @crit like in(lastname,firstname)
dont know where to put the wild card ‘%’
can this be possible?
It should be
Select lastname,firstname from TblResourceProfile
where @crit like lastname+’%’ or @crit like firstname+’%’
There is a third approach if you have a master and detail table – WHERE EXISTS. For example, if you have an OrderHeader table and an OrderItem table that have a one to many relationship on OrderHeaderId, you can decide unique orders in OrderItem one of 2 ways
SELECT DISTINCT orderheaderid from orderitem
SELECT orderheaderid FROM orderheader oh WHERE EXISTS (SELECT 1 FROM OrderItem WHERE OrderHeaderId=oh.OrderHeaderId)
If these queries are run together, the “DISTINCT” query takes about 2/3 total query time, with WHERE EXISTS taken up a 1/3.
(Ahh but why dont you just select from OrderHeader in this case. Well that would not omit orders that have no item records – if thats how you represent cancelled orders)
In production code, always try to write SQL queries that produce unique results WITHOUT DISTINCT clauses. If performance is what you are after and you have one to many table relationships, use WHERE EXISTS.
We have published an article of SQL Performance Do’s and Dont’s if thats of help to anyone.
Steve Hatchard
Director
Mattched IT Ltd
Is there a way to execute a DISTINCT query on a single field while the SELECT statement contains more fields. For example this statement:
SELECT DISTINCT EmployeeID, EmployeeName, ContractDate, CurrentTitlle FROM EmployeeTable
I am only interested to fetch DISTINCT reocrds based on the EmployeeID. Please note that more than one occurence of each employee’s record exists in this table. Therefore getting DISTINCT on all fields is not possible.
If DISTINCT cannot solve this problem, what is other best approach in achieve this?
Tariq Changgez
Your best answer is to have two tables – one for unique employee data EmployeeId, EmployeeName, CurrentTitle etc.. and one for multiple row data – EmployeeId, ContractDate. It is usually best practice to “normalise” data like this.
This off topic and not really an issue for DISTINCT.
Steve Hatchard
Director
Mattched IT Ltd
When request is very simple, Distinct can be used else always use Group By. really faster.
hi pinal,
i’m struck with an issue related to distinct clause.
i’ve fired a query in which i’ve used distict clause. As per that i’m getting distinct rows but they are repeatative in nature.
e.g. if i’m getting 3 distinct rows from my query; i’m getting 1st row many hundreds of time, followed by 2nd and 3rd row in the same manner. How to suppress these repeated rows and how to only three distinct rows as a output?
Thanx in advance.
i like pinal dave blogs
[...] SQL SERVER – Difference between DISTINCT and GROUP BY – Distinct vs Group By [...]
Hi Pinal,
I have a situation, I want to count the amount of time a similar name occurs in a table.
E.g.
Table Buses,
Starmart
Starmart
Starmart
Starmart
Starmart
Damai
Damai
Damai
Damai
Damai
Damai
Damai
Damai
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
Senibudaya
How do i actually type out a sql script to group and return the count for each name.
This is what i used so far
select CurrentOperator, COUNT(*) CurrentOperator from EZBusDB.dbo.Buses
GROUP BY CurrentOperator,
I am getting this error:
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near ‘,’.
How i go about it?
Thank you!
“select CurrentOperator, COUNT(*) CurrentOperator from EZBusDB.dbo.Buses
GROUP BY CurrentOperator”
Delete Comma(,) At the End of Your Command
@shwaria
Remove comma (,) at the end of second line, to reduce confusion, you can use different column name.
select CurrentOperator, COUNT(*) Count_CurrentOperator from EZBusDB.dbo.Buses
GROUP BY CurrentOperator
~IM
Hi You need to Remove (,) from the end of your statement
I read your posts for quite a long time and should tell that your posts always prove to be of a high value and quality for readers.
Thank you Ted
hi thanks so much for between dif but when and why used this arti……..
Hye!
I’m trying a simple query with DISTINCT keyword as:
SELECT DISTINCT SecCode FROM Sections ORDER BY SecCode
in SQL Server 2005, but it is not giving me any result.
Can anyone help me?
Thanks!
Dear
Could u plz suggest me a technique where i can implement transactional replication technique with ONE – MANY Tables relationships……….. i want to replicate the different remote databases with same name n having same articles and structures in centralized single database with same single articles………………..
select DENSE_RANK() Over (Order by col1, col2 ) as Ranky, col1, col2 from table where…
group By Col1, col2 — gives distinct
Thanks Pinal,
These posts are very useful.
I found them after attending your presentation in teched delhi.
i try to get distinct values for name and get all other fields with out where condition
ex. table
id name level department companyid
1 ram 1 1 1
2 ram 2 2 2
3 kumar 3 3 3
4 Raja 4 4 4
i expect value is
1 ram 1 1 1
3 kumar 3 3 3
4 Raja 4 4 4
(or)
1 ram 1 1 1
2 ram 2 2 2
4 Raja 4 4 4
Thanks Pinal.
Very useful posts.
Hello I am trying to make a selection in 3 different tables using distinct, but I keep getting duplicate values, my tables are Alumni, History, and Groups, I get the id and name of students enrolled in the group 230, my query is
SELECT DISTINCT H. Go, A. LastName, A. Name FROM History H, Alumni A, Groups I WHERE Group = 230 AND H. H.id = I.id AND ORDER BY I.key = A.key H.id ” )
appear two students with the same value:
9876 M John
9876 M John
I hope you can help me as no way out, and I’m starting to use SQL.
@Lizbeth Morales
The query didn’t seem to come through correctly.
Please repost it, along with the COLUMNs in each TABLE used. It is hard to determine what is going on without them.
Thanks
useful info
select name,count(*) as No_Of_Visit,sum(amount)from customer group by name
How about:
SELECT code, Month(d_date) as d_month, SUM(qty) as quantity
FROM t_test
GROUP BY code, d_month
or should I group by code, Month(d_date)?
Hi,
Yes, you should group by code, Month(d_date) to get perfect output otherwise you would get multiple records for each set of code and Month(d_date).
Regards,
Pinal Dave
Hi Pinal,
Is there a way to find the first non blank distinct data from query…
e.g.
Name Comment
————————————–
ABC
Ajay JKL
Vikas XYZ
i want result as below…. is there a way to get result from query ?
Name Comment
————————————–
Ajay ABC
@Jignesh
I do not understand what you want. I have tried something though:
WITH
Data(Name, Comment)
AS
(
SELECT NULL, ‘ABC’ UNION ALL
SELECT ‘Ajay’, ‘JKL’ UNION ALL
SELECT ‘Vikas’, ‘XYZ’
)
SELECT
(SELECT TOP(1) Name FROM Data WHERE Name IS NOT NULL ORDER BY Name ) Name,
(SELECT TOP(1) Comment FROM Data WHERE Comment IS NOT NULL ORDER BY Comment ) Comment;
Thanks.Brian. Even without format you understood properly !
Now i have second question is it possible to get COUNT IN nested distinct via SQL query from a table.
for example :
Enquiry No , Part No
—————————————
E1 P1
E1 P1
E2 P2
E2 P1
E3 P2
E3 P3
EXPECTED RESULT :
——————————————-
Enquiry NO DISTINCT_PART_COUNT
—————————————
E1 1
E2 2
E3 2
Any help is highly appreciated…
select [Enquiry No] , count(distinct [Part No]) as DISTINCT_PART_COUNT from your_table
group by [Enquiry No]
let’s say i have a table with one column containing the following 6 records:
3
3
4
5
5
5
9
8
9
9
9
9
is there a way with a single standard SQL query to count the unique records in that column?
I need Answer set
3 2
4 1
5 3
8 1
9 5
(the query should return “9″) the max nom reapeated in the column
Hi Sally,
You can use ROW_NUMBER keyword to achieve expected output.
DECLARE @tmp TABLE(SrNo INT)
INSERT INTO @tmp VALUES(3)
INSERT INTO @tmp VALUES(3)
INSERT INTO @tmp VALUES(4)
INSERT INTO @tmp VALUES(5)
INSERT INTO @tmp VALUES(5)
INSERT INTO @tmp VALUES(5)
INSERT INTO @tmp VALUES(9)
INSERT INTO @tmp VALUES(8)
INSERT INTO @tmp VALUES(9)
INSERT INTO @tmp VALUES(9)
INSERT INTO @tmp VALUES(9)
INSERT INTO @tmp VALUES(9)
SELECT DISTINCT *,
COUNT(*) OVER(PARTITION BY SrNo) AS Total
FROM @tmp
This will run for SQL 2005 and above only.
Thanks,
Tejas
SQLYoga.com
thnks Tejas Shah so much
CREATE table tmp (SrNo INT)
INSERT INTO tmp VALUES(3)
INSERT INTO tmp VALUES(3)
INSERT INTO tmp VALUES(4)
INSERT INTO tmp VALUES(5)
INSERT INTO tmp VALUES(5)
INSERT INTO tmp VALUES(5)
INSERT INTO tmp VALUES(9)
INSERT INTO tmp VALUES(8)
INSERT INTO tmp VALUES(9)
INSERT INTO tmp VALUES(9)
INSERT INTO tmp VALUES(9)
INSERT INTO tmp VALUES(9)
select top 1 srno
from tmp
group by srno
order by count(srno)
desc
thanks Samba Shiva Reddy for your help
does anyone help me
……..how can I Make OLAP in SQl server 2005
how can i Create Local Cubes with Microsoft SQL Server 2005
HI, i am new to OLAP in Sql Server 2005, bascially i have read through lots of article on this OLAP and basically i get to know how it works and function theorically, but i could not find an article to show me step by step on how to create the cube and how to query the data from the cube. Would very much appreciated if somebody out there can explain to me in details how to i create cube from sql server 2005 as i totally no idea at all. Though i can view the sample cude in sql server 2005, but i dunno how to create it from my relational database. I need it very urgent as needed by my job functionality. Hope to get some explanation from you guys asap. Just assume i am totally new and please guide me step by step to create a normal and simple cube and query from the cube. I just need to create simple prototype. Thanks alot yea guys :)
I’ll use DISTINCT as it is the true meaning of what I want to achieve… I had just heard in the past that GROUP BY was a lot more performant, but like you say it should execute the same query plan – and I can always check that too!
So thanks, I am now not as “anti-DISTINCT” as I was before I read your post. ;-)
Cheers.
Hi,
I have the following data in the table.
product price qty
====================
A 10 5
A 20 2
A 30 1
B 20 3
B 10 7
The result that i expect from SQL query
basically the total = sum (price x qty) but quantity should be the lowest among same product
for product A is
LINETOTAL
50
for product B is
SUMTOTAL
70
Could you please help me how to write the SQL query.
Thanks,
Jignesh
@Jignesh
The lowest or the highest?
WITH
Data(product, price, qty)
AS
(
SELECT ‘A’, 10, 5 UNION ALL
SELECT ‘A’, 20, 2 UNION ALL
SELECT ‘A’, 30, 1 UNION ALL
SELECT ‘B’, 20, 3 UNION ALL
SELECT ‘B’, 10, 7
),
Data_RN
AS
(
SELECT
product,
price,
qty,
ROW_NUMBER() OVER(PARTITION BY Product ORDER BY qty DESC) RN
FROM
Data
)
SELECT
product,
price * qty
FROM
Data_RN
WHERE
RN = 1;
Thanks brian thanks a lot ! :-)
Sorry it was for lowest quantity…
Hi
I have a problem with join and group by
when I use group by in sub query the result will be done with 7 rows
SELECT classNameId , max( viewCount) as viewcount
FROM tagsasset
group by classNameId
classNameId viewcount
10011 0
10020 0
10040 0
10056 31
10066 10
10067 5409
10111 2
but when join it the result has 63 rows
because I put 2 condition
when it was do 2 conditon for equal result show all of the row have this result
SELECT TA.title,TA.viewcount,TA.classnameid,c.value
from tagsasset TA inner join classname_ c
on c.classnameid=TA.classnameid
inner join
(SELECT classNameId , max( viewCount) as viewcount
FROM tagsasset
group by classNameId )groupet on
TA.viewcount=groupet.viewcount
and
groupet.classnameid=TA.classnameid
view classnameid
0 10040
0 10040
0 10040
0 10040
0 10040
0 10040
0 10040
0 10040
0 10040
0 10040
0 10040
0 10040
0 10040
0 10040
0 10040
31 10056
10 10066
5409 10067
2 10111
2 10111
I have duplicate result
plz help me
@nahid
What should the result be?
i have results like the following
Marea,Garea,SGarea
bck1,gitem1,sgitem1
bck1,gitem1,sgitem2
bck1,gitem1,sgitem3
bck2,gitem2,sgitem11
bck2,gitem2,sgitem12
bck2,gitem2,sgitem13
i require quey which should give result like below
(note:i wanted to show in a gridview like below)
bck1 gitem1 sgitem1
sgitem2
sgitem3
bck2 gitem2 sgitem11
sgitem12
sgitem13
can any one give me some idea.how i can make it possible.
from
lokesh
DECLARE @tmp TABLE(SrNo INT)
INSERT INTO @tmp VALUES(3)
INSERT INTO @tmp VALUES(3)
INSERT INTO @tmp VALUES(4)
INSERT INTO @tmp VALUES(5)
INSERT INTO @tmp VALUES(5)
INSERT INTO @tmp VALUES(5)
INSERT INTO @tmp VALUES(9)
INSERT INTO @tmp VALUES(8)
INSERT INTO @tmp VALUES(9)
INSERT INTO @tmp VALUES(9)
INSERT INTO @tmp VALUES(9)
INSERT INTO @tmp VALUES(9)
SELECT SrNo,COUNT(*) AS Total
FROM @tmp group by SrNo
Hey Sally,
U can use above query which gives the result that u expected
Thanks
Ram
Hi All,
I have a db with fields Part number,supplier number and country name
I would like to get a table with one unique record for partnumber and the country1 name country 2 name,–etc..
example this is what I have
partno suno country
001 123 US
001 234 IN
001 567 AU
001 897 CA
002 235 IN
002 566 SR
etc…
I need a table like this
part num country1 country 2 country 3 country 4
001 US IN AU CA
002 IN SR
etc…
Maximum suppliers will be usually from 1-10 ( Assumption)
Could anyone please help me out with the sql query to get this result.
.we may have use do while loop,but no clear idea
please hep me out..
Thanks,
Santhu
I guess something is missing in Article. Distinct creates a temp table and remove duplicates. Group by essentially does the same things, but it has additional overhead of sorting the records. By default group by is ASC sort. so Group by is slower in cases where the key grouping column is not indexed. If the query columns are already in clustered indexed, it may be same operation.
I would go with query plan and based on instincts will prefer Distinct over Group by
thanks
Hi
why we should not use distinct and group by within the same sql statement or query??? any performance issue can u explain pls????????
Very nice…
Hi,
i want to find distinct top.
top 3 for all UserId’s.
for ex:
the table is as below:
Id UserId Date
1 10 10/10/2010
1 10 10/10/2010
1 5 10/10/2010
1 10 10/10/2010
1 5 10/10/2010
1 7 10/10/2010
1 10 10/10/2010
1 5 10/10/2010
1 10 10/10/2010
1 5 10/10/2010
In the above table userid 10, 5 & 7 has 5, 4 & 1 rows respectively.
Now i expect 3 or below 3 from each user.
For Ex:
Id UserId Date
1 10 10/10/2010
1 10 10/10/2010
1 10 10/10/2010
1 5 10/10/2010
1 5 10/10/2010
1 5 10/10/2010
1 7 10/10/2010
how to solve this.
regards
rajendran
Make sure to read this
http://beyondrelational.com/blogs/madhivanan/archive/2008/09/12/return-top-n-rows.aspx
SELECT ROW_NUMBER() OVER (ORDER BY inserted_date DESC)
AS Row, distinct product_type, parent_id,product_id,brand_id,device_id,p.subcategory_id,upc_code,sku,product_name,p.description,price,product_image,
isDevice,dimension_length,dimension_length_unit,dimension_width,dimension_width_unit,
dimension_height,dimension_height_unit,dimension_weight,dimension_weight_unit from tblProductDetails p join tblSubCategory s
on p.subcategory_id=s.subcategory_id
where parent_id=@cat_id and p.subcategory_id=@sub_id
My distinct keyword is having error.
You cannot use distinct that way. What do you want to get? Post some sample data with expected result
I guess something is missing in Article. Distinct creates a temp table and remove duplicates. Group by essentially does the same things, but it has additional overhead of sorting the records. By default group by is ASC sort. so Group by is slower in cases where the key grouping column is not indexed. If the query columns are already in clustered indexed, it may be same operation.
I would go with query plan and based on instincts will prefer Distinct over Group by
this is my query and
select company_name ,department_name from Department_Master where Status = ‘N’ order by company_name, group by company_name
It shows me this kind of the error
Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword ‘group’.
What should i do to use the group by clouse in sql server 2008
Thanks in advance
It should be
select company_name ,department_name from Department_Master where Status = ‘N’ group by company_name, order by company_name
Thanks for posting this very informational blog. Now I know the difference between DISTINCT and GROUP BY. I hope to achieve something from what I learn today.
Hi Sally,
You can do the following to get the results inon estatement.
SELECT Col1,COUNT(*) AS Total FROM TMP
GROUP BY Col1
ORDER BY Col1
I need your help ragarding sql query.
Val_1, Val_2, Val_3,Val_4
a b c 1
a b c 2
a b c 4
m n o 1
m n o 2
and output should be
val_1, Val_2,val_3, val_4
a b c 1,2,4
m n o 1,2
Pls help if anyone know…
Dear all i need a help from you people. I am using vb6 & ms sql server2000. I have two tables.
1- Order (master table) having data
ORDERID DATE EMPNAME
1 1-2-1999 JOHN
2 1-1-2010 JOHN
3 1-12-2011 FOAD
2-Orderdetail (child table) having data
PRID PRNAME SP QTY TOTAL ORDERID
11 CD 5 10 50 1
11 CD 5 5 25 1
12 DVD 10 5 50 2
i wan to get data like this
Empname Prname Qty SP total
JOHN CD 15 5 75
DVD 5 10 50
Regards
hi pinal,
i find your blog very useful.
i have been working for a airticketing service i have a table with 2 cols id and wsdate(datetime datatype) where wsdate contains date and time which got some thousands of records on different dates,i want to display distinct date col and other one is count of rows from the table on that particular date.please try to help me out
Thank u sir, I forgot Distinct Clause, to remove duplicate values…. At the right time your answer helps me…alot..
Hi Pinal,
can you tell me why count of distinct id’s [id's are 0]=1?
[...] Difference between DISTINCT and GROUP BY – Distinct vs Group By A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster. [...]