SQL SERVER – Difference between DISTINCT and GROUP BY – Distinct vs Group By

This question is asked many times to me. What is difference between DISTINCT and GROUP BY?

A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster.

Example of DISTINCT:
SELECT DISTINCT Employee, Rank
FROM Employees

Example of GROUP BY:
SELECT Employee, Rank
FROM Employees
GROUP BY Employee, Rank

Example of GROUP BY with aggregate function:
SELECT Employee, Rank, COUNT(*) EmployeeCount
FROM Employees
GROUP BY Employee, Rank

Reference : Pinal Dave (http://blog.SQLAuthority.com)

About these ads

90 thoughts on “SQL SERVER – Difference between DISTINCT and GROUP BY – Distinct vs Group By

  1. hay can u tell that how can i found average of colume valu in crystal reprot……not use cross tab only by queryes…. of many

  2. hey…
    i have a performance issue with me.
    in one query and corresponding sub queries they have used so many distinct clauses . so suggest me the best replacement
    to the distinct to improve the performance.

  3. I have view reteriving records from multiple tables using join.. I want DISTINCT records but for speific field in Jet SQL you can do it by DISTINCT ( Employee.EmployeeId), …How in SQL server… thanks

  4. SELECT DISTINCT Purchase.SessionId, UserInfo.UserId, UserInfo.UserName, Purchase.grandTotal, Purchase.PurchaseDate, Purchase.Status
    FROM UserInfo INNER JOIN Purchase ON UserInfo.UserId=Purchase.Uid;

    but this doesn’t generate correct result ……………… plz reply whats problem with that

  5. Hello Pinal:

    There is a table t_test with columns
    [id, name, code, description].

    I can write a query using MySQL as

    SELECT
    *
    FROM t_test
    GROUP BY Code

    What would be equivalent sql server query for this.

    Thanks.

  6. hh, I can’t really say what the query should be without knowing what your expected result is.

    Having a sample of a few typical data rows from the UserInfo and Purchase tables might also be neccessary.

    However, “SELECT DISTINCT” performs a hash match on the selected fields, and omits any records producing a duplicate hash. In other words, the query you used will return any record where the returned column values, taken together, are different from any other record in the result set.

    In general, when using “distinct”, more columns selected = more rows returned. If you need to select all of these columns into the result set, but are only concerned that a few of them be unique, then you need a different construct. Two possibilities are:

    1. Use “GROUP BY” to eliminate duplicates on the desired columns, and then apply aggregate functions (MIN, MAX, AVG, etc) to the remaining columns.

    2. Use subqueries or self-joins to limit the records if you need selected columns to be unique and don’t want to aggregate the other columns.

  7. I have an issue with both DISTINCT and GROUP BY.

    The issue is that using either one, the results comes back ordered as if using order by.

    I need the unique results in the order they are in the database. How can I do this? Thanks

    • Note that you should not rely upon the natual ordering of the SQL Server. If you use versions earlier than 2000, it defaults the result by ORDER. But it is not gauranteed that it will work always. In Versions 2005 and above, there is no default order when Distinct is used

  8. i have a question regarding search criteria

    Select lastname,firstname from TblResourceProfile
    where @crit like in(lastname,firstname)
    dont know where to put the wild card ‘%’
    i

    • Select lastname,firstname from TblResourceProfile
      where (lastname like ‘%@crit%’) or (firstname like ‘%@crit%’)

  9. Select lastname,firstname from TblResourceProfile
    where @crit like in(lastname,firstname)
    dont know where to put the wild card ‘%’
    can this be possible?

  10. There is a third approach if you have a master and detail table – WHERE EXISTS. For example, if you have an OrderHeader table and an OrderItem table that have a one to many relationship on OrderHeaderId, you can decide unique orders in OrderItem one of 2 ways

    SELECT DISTINCT orderheaderid from orderitem

    SELECT orderheaderid FROM orderheader oh WHERE EXISTS (SELECT 1 FROM OrderItem WHERE OrderHeaderId=oh.OrderHeaderId)

    If these queries are run together, the “DISTINCT” query takes about 2/3 total query time, with WHERE EXISTS taken up a 1/3.

    (Ahh but why dont you just select from OrderHeader in this case. Well that would not omit orders that have no item records – if thats how you represent cancelled orders)

    In production code, always try to write SQL queries that produce unique results WITHOUT DISTINCT clauses. If performance is what you are after and you have one to many table relationships, use WHERE EXISTS.

    We have published an article of SQL Performance Do’s and Dont’s if thats of help to anyone.

    Steve Hatchard
    Director
    Mattched IT Ltd

  11. Is there a way to execute a DISTINCT query on a single field while the SELECT statement contains more fields. For example this statement:

    SELECT DISTINCT EmployeeID, EmployeeName, ContractDate, CurrentTitlle FROM EmployeeTable

    I am only interested to fetch DISTINCT reocrds based on the EmployeeID. Please note that more than one occurence of each employee’s record exists in this table. Therefore getting DISTINCT on all fields is not possible.

    If DISTINCT cannot solve this problem, what is other best approach in achieve this?

    Tariq Changgez

  12. Your best answer is to have two tables – one for unique employee data EmployeeId, EmployeeName, CurrentTitle etc.. and one for multiple row data – EmployeeId, ContractDate. It is usually best practice to “normalise” data like this.

    This off topic and not really an issue for DISTINCT.

    Steve Hatchard
    Director
    Mattched IT Ltd

  13. hi pinal,
    i’m struck with an issue related to distinct clause.
    i’ve fired a query in which i’ve used distict clause. As per that i’m getting distinct rows but they are repeatative in nature.
    e.g. if i’m getting 3 distinct rows from my query; i’m getting 1st row many hundreds of time, followed by 2nd and 3rd row in the same manner. How to suppress these repeated rows and how to only three distinct rows as a output?
    Thanx in advance.

  14. Pingback: SQLAuthority News - Best Articles on SQLAuthority.com Journey to SQL Authority with Pinal Dave

  15. Hi Pinal,

    I have a situation, I want to count the amount of time a similar name occurs in a table.

    E.g.

    Table Buses,

    Starmart
    Starmart
    Starmart
    Starmart
    Starmart
    Damai
    Damai
    Damai
    Damai
    Damai
    Damai
    Damai
    Damai
    Senibudaya
    Senibudaya
    Senibudaya
    Senibudaya
    Senibudaya
    Senibudaya
    Senibudaya
    Senibudaya

    How do i actually type out a sql script to group and return the count for each name.

    This is what i used so far
    select CurrentOperator, COUNT(*) CurrentOperator from EZBusDB.dbo.Buses
    GROUP BY CurrentOperator,

    I am getting this error:
    Msg 102, Level 15, State 1, Line 2
    Incorrect syntax near ‘,’.

    How i go about it?

    Thank you!

  16. @shwaria

    Remove comma (,) at the end of second line, to reduce confusion, you can use different column name.

    select CurrentOperator, COUNT(*) Count_CurrentOperator from EZBusDB.dbo.Buses
    GROUP BY CurrentOperator

    ~IM

  17. Hye!

    I’m trying a simple query with DISTINCT keyword as:

    SELECT DISTINCT SecCode FROM Sections ORDER BY SecCode

    in SQL Server 2005, but it is not giving me any result.

    Can anyone help me?

    Thanks!

  18. Dear

    Could u plz suggest me a technique where i can implement transactional replication technique with ONE – MANY Tables relationships……….. i want to replicate the different remote databases with same name n having same articles and structures in centralized single database with same single articles………………..

  19. select DENSE_RANK() Over (Order by col1, col2 ) as Ranky, col1, col2 from table where…
    group By Col1, col2 — gives distinct

  20. Thanks Pinal,

    These posts are very useful.
    I found them after attending your presentation in teched delhi.

  21. i try to get distinct values for name and get all other fields with out where condition

    ex. table

    id name level department companyid

    1 ram 1 1 1

    2 ram 2 2 2

    3 kumar 3 3 3

    4 Raja 4 4 4

    i expect value is
    1 ram 1 1 1

    3 kumar 3 3 3

    4 Raja 4 4 4

    (or)
    1 ram 1 1 1

    2 ram 2 2 2

    4 Raja 4 4 4

    • try like this..

      DECLARE @TEMPTABLE TABLE(ID BIGINT ,NAME VARCHAR(8),EmpID INT,departmentID INT,companyid INT)

      INSERT INTO @TEMPTABLE
      VALUES(1,’ram’,1,1,1),(2,’ram’,2,2,2),(3,’kumar’,3,3,3),(4,’Raja’,4,4,4)

      SELECT ID,Name,EmpID,departmentID,companyid FROM
      (
      SELECT ID,Name,EmpID,departmentID,companyid,Row_number() over (partition by name order by EmpID,departmentID,companyid) AS RowID
      FROM @TEMPTABLE
      )
      AS A WHERE RowID=1
      ORDER BY ID

  22. Hello I am trying to make a selection in 3 different tables using distinct, but I keep getting duplicate values, my tables are Alumni, History, and Groups, I get the id and name of students enrolled in the group 230, my query is

    SELECT DISTINCT H. Go, A. LastName, A. Name FROM History H, Alumni A, Groups I WHERE Group = 230 AND H. H.id = I.id AND ORDER BY I.key = A.key H.id ” )

    appear two students with the same value:

    9876 M John
    9876 M John

    I hope you can help me as no way out, and I’m starting to use SQL.

  23. @Lizbeth Morales

    The query didn’t seem to come through correctly.

    Please repost it, along with the COLUMNs in each TABLE used. It is hard to determine what is going on without them.

  24. How about:

    SELECT code, Month(d_date) as d_month, SUM(qty) as quantity
    FROM t_test
    GROUP BY code, d_month

    or should I group by code, Month(d_date)?

  25. Hi,

    Yes, you should group by code, Month(d_date) to get perfect output otherwise you would get multiple records for each set of code and Month(d_date).

    Regards,
    Pinal Dave

  26. Hi Pinal,
    Is there a way to find the first non blank distinct data from query…

    e.g.
    Name Comment
    ————————————–
    ABC
    Ajay JKL
    Vikas XYZ

    i want result as below…. is there a way to get result from query ?

    Name Comment
    ————————————–
    Ajay ABC

    • @Jignesh

      I do not understand what you want. I have tried something though:

      WITH
      Data(Name, Comment)
      AS
      (
      SELECT NULL, ‘ABC’ UNION ALL
      SELECT ‘Ajay’, ‘JKL’ UNION ALL
      SELECT ‘Vikas’, ‘XYZ’
      )
      SELECT
      (SELECT TOP(1) Name FROM Data WHERE Name IS NOT NULL ORDER BY Name ) Name,
      (SELECT TOP(1) Comment FROM Data WHERE Comment IS NOT NULL ORDER BY Comment ) Comment;

  27. Thanks.Brian. Even without format you understood properly !

    Now i have second question is it possible to get COUNT IN nested distinct via SQL query from a table.

    for example :
    Enquiry No , Part No
    —————————————
    E1 P1
    E1 P1
    E2 P2
    E2 P1
    E3 P2
    E3 P3

    EXPECTED RESULT :
    ——————————————-
    Enquiry NO DISTINCT_PART_COUNT
    —————————————
    E1 1
    E2 2
    E3 2

    Any help is highly appreciated…

  28. let’s say i have a table with one column containing the following 6 records:
    3
    3
    4
    5
    5
    5
    9
    8
    9
    9
    9
    9

    is there a way with a single standard SQL query to count the unique records in that column?
    I need Answer set
    3 2
    4 1
    5 3
    8 1
    9 5

    (the query should return “9″) the max nom reapeated in the column

    • Try like this..

      DECLARE @Temp TABLE(ID INT)
      INSERT INTO @Temp
      VALUES(1),(1),(2),(2),(3),(3),(4),(4),(1),(1)

      select ID,COUNT(1) AS RepeatedCount from @Temp
      GROUP BY ID

  29. Hi Sally,

    You can use ROW_NUMBER keyword to achieve expected output.

    DECLARE @tmp TABLE(SrNo INT)

    INSERT INTO @tmp VALUES(3)
    INSERT INTO @tmp VALUES(3)
    INSERT INTO @tmp VALUES(4)
    INSERT INTO @tmp VALUES(5)
    INSERT INTO @tmp VALUES(5)
    INSERT INTO @tmp VALUES(5)
    INSERT INTO @tmp VALUES(9)
    INSERT INTO @tmp VALUES(8)
    INSERT INTO @tmp VALUES(9)
    INSERT INTO @tmp VALUES(9)
    INSERT INTO @tmp VALUES(9)
    INSERT INTO @tmp VALUES(9)

    SELECT DISTINCT *,
    COUNT(*) OVER(PARTITION BY SrNo) AS Total
    FROM @tmp

    This will run for SQL 2005 and above only.

    Thanks,

    Tejas
    SQLYoga.com

  30. CREATE table tmp (SrNo INT)
    INSERT INTO tmp VALUES(3)
    INSERT INTO tmp VALUES(3)
    INSERT INTO tmp VALUES(4)
    INSERT INTO tmp VALUES(5)
    INSERT INTO tmp VALUES(5)
    INSERT INTO tmp VALUES(5)
    INSERT INTO tmp VALUES(9)
    INSERT INTO tmp VALUES(8)
    INSERT INTO tmp VALUES(9)
    INSERT INTO tmp VALUES(9)
    INSERT INTO tmp VALUES(9)
    INSERT INTO tmp VALUES(9)

    select top 1 srno
    from tmp
    group by srno
    order by count(srno)
    desc

  31. HI, i am new to OLAP in Sql Server 2005, bascially i have read through lots of article on this OLAP and basically i get to know how it works and function theorically, but i could not find an article to show me step by step on how to create the cube and how to query the data from the cube. Would very much appreciated if somebody out there can explain to me in details how to i create cube from sql server 2005 as i totally no idea at all. Though i can view the sample cude in sql server 2005, but i dunno how to create it from my relational database. I need it very urgent as needed by my job functionality. Hope to get some explanation from you guys asap. Just assume i am totally new and please guide me step by step to create a normal and simple cube and query from the cube. I just need to create simple prototype. Thanks alot yea guys :)

  32. I’ll use DISTINCT as it is the true meaning of what I want to achieve… I had just heard in the past that GROUP BY was a lot more performant, but like you say it should execute the same query plan – and I can always check that too!

    So thanks, I am now not as “anti-DISTINCT” as I was before I read your post. ;-)

    Cheers.

  33. Hi,

    I have the following data in the table.

    product price qty
    ====================
    A 10 5
    A 20 2
    A 30 1
    B 20 3
    B 10 7

    The result that i expect from SQL query
    basically the total = sum (price x qty) but quantity should be the lowest among same product

    for product A is
    LINETOTAL
    50

    for product B is
    SUMTOTAL
    70

    Could you please help me how to write the SQL query.

    Thanks,
    Jignesh

    • @Jignesh

      The lowest or the highest?

      WITH
      Data(product, price, qty)
      AS
      (
      SELECT ‘A’, 10, 5 UNION ALL
      SELECT ‘A’, 20, 2 UNION ALL
      SELECT ‘A’, 30, 1 UNION ALL
      SELECT ‘B’, 20, 3 UNION ALL
      SELECT ‘B’, 10, 7
      ),
      Data_RN
      AS
      (
      SELECT
      product,
      price,
      qty,
      ROW_NUMBER() OVER(PARTITION BY Product ORDER BY qty DESC) RN
      FROM
      Data
      )
      SELECT
      product,
      price * qty
      FROM
      Data_RN
      WHERE
      RN = 1;

  34. Hi
    I have a problem with join and group by
    when I use group by in sub query the result will be done with 7 rows

    SELECT classNameId , max( viewCount) as viewcount
    FROM tagsasset
    group by classNameId

    classNameId viewcount
    10011 0
    10020 0
    10040 0
    10056 31
    10066 10
    10067 5409
    10111 2

    but when join it the result has 63 rows
    because I put 2 condition
    when it was do 2 conditon for equal result show all of the row have this result

    SELECT TA.title,TA.viewcount,TA.classnameid,c.value
    from tagsasset TA inner join classname_ c
    on c.classnameid=TA.classnameid
    inner join
    (SELECT classNameId , max( viewCount) as viewcount
    FROM tagsasset
    group by classNameId )groupet on
    TA.viewcount=groupet.viewcount
    and
    groupet.classnameid=TA.classnameid
    view classnameid
    0 10040
    0 10040
    0 10040
    0 10040
    0 10040
    0 10040
    0 10040
    0 10040
    0 10040
    0 10040
    0 10040
    0 10040
    0 10040
    0 10040
    0 10040
    31 10056
    10 10066
    5409 10067
    2 10111
    2 10111

    I have duplicate result
    plz help me

  35. i have results like the following

    Marea,Garea,SGarea
    bck1,gitem1,sgitem1
    bck1,gitem1,sgitem2
    bck1,gitem1,sgitem3
    bck2,gitem2,sgitem11
    bck2,gitem2,sgitem12
    bck2,gitem2,sgitem13

    i require quey which should give result like below
    (note:i wanted to show in a gridview like below)

    bck1 gitem1 sgitem1
    sgitem2
    sgitem3
    bck2 gitem2 sgitem11
    sgitem12
    sgitem13

    can any one give me some idea.how i can make it possible.

    from
    lokesh

  36. DECLARE @tmp TABLE(SrNo INT)

    INSERT INTO @tmp VALUES(3)
    INSERT INTO @tmp VALUES(3)
    INSERT INTO @tmp VALUES(4)
    INSERT INTO @tmp VALUES(5)
    INSERT INTO @tmp VALUES(5)
    INSERT INTO @tmp VALUES(5)
    INSERT INTO @tmp VALUES(9)
    INSERT INTO @tmp VALUES(8)
    INSERT INTO @tmp VALUES(9)
    INSERT INTO @tmp VALUES(9)
    INSERT INTO @tmp VALUES(9)
    INSERT INTO @tmp VALUES(9)

    SELECT SrNo,COUNT(*) AS Total
    FROM @tmp group by SrNo

  37. Hi All,
    I have a db with fields Part number,supplier number and country name

    I would like to get a table with one unique record for partnumber and the country1 name country 2 name,–etc..
    example this is what I have

    partno suno country
    001 123 US
    001 234 IN
    001 567 AU
    001 897 CA
    002 235 IN
    002 566 SR

    etc…

    I need a table like this

    part num country1 country 2 country 3 country 4
    001 US IN AU CA
    002 IN SR

    etc…

    Maximum suppliers will be usually from 1-10 ( Assumption)

    Could anyone please help me out with the sql query to get this result.

    .we may have use do while loop,but no clear idea

    please hep me out..

    Thanks,
    Santhu

  38. I guess something is missing in Article. Distinct creates a temp table and remove duplicates. Group by essentially does the same things, but it has additional overhead of sorting the records. By default group by is ASC sort. so Group by is slower in cases where the key grouping column is not indexed. If the query columns are already in clustered indexed, it may be same operation.
    I would go with query plan and based on instincts will prefer Distinct over Group by

    thanks

  39. Hi

    why we should not use distinct and group by within the same sql statement or query??? any performance issue can u explain pls????????

  40. Hi,

    i want to find distinct top.

    top 3 for all UserId’s.

    for ex:

    the table is as below:

    Id UserId Date
    1 10 10/10/2010
    1 10 10/10/2010
    1 5 10/10/2010
    1 10 10/10/2010
    1 5 10/10/2010
    1 7 10/10/2010
    1 10 10/10/2010
    1 5 10/10/2010
    1 10 10/10/2010
    1 5 10/10/2010

    In the above table userid 10, 5 & 7 has 5, 4 & 1 rows respectively.

    Now i expect 3 or below 3 from each user.

    For Ex:

    Id UserId Date
    1 10 10/10/2010
    1 10 10/10/2010
    1 10 10/10/2010
    1 5 10/10/2010
    1 5 10/10/2010
    1 5 10/10/2010
    1 7 10/10/2010

    how to solve this.

    regards
    rajendran

  41. SELECT ROW_NUMBER() OVER (ORDER BY inserted_date DESC)
    AS Row, distinct product_type, parent_id,product_id,brand_id,device_id,p.subcategory_id,upc_code,sku,product_name,p.description,price,product_image,
    isDevice,dimension_length,dimension_length_unit,dimension_width,dimension_width_unit,
    dimension_height,dimension_height_unit,dimension_weight,dimension_weight_unit from tblProductDetails p join tblSubCategory s
    on p.subcategory_id=s.subcategory_id
    where parent_id=@cat_id and p.subcategory_id=@sub_id

    My distinct keyword is having error.

  42. I guess something is missing in Article. Distinct creates a temp table and remove duplicates. Group by essentially does the same things, but it has additional overhead of sorting the records. By default group by is ASC sort. so Group by is slower in cases where the key grouping column is not indexed. If the query columns are already in clustered indexed, it may be same operation.
    I would go with query plan and based on instincts will prefer Distinct over Group by

  43. this is my query and

    select company_name ,department_name from Department_Master where Status = ‘N’ order by company_name, group by company_name

    It shows me this kind of the error

    Msg 156, Level 15, State 1, Line 1
    Incorrect syntax near the keyword ‘group’.

    What should i do to use the group by clouse in sql server 2008

    Thanks in advance

  44. Hi Sally,

    You can do the following to get the results inon estatement.

    SELECT Col1,COUNT(*) AS Total FROM TMP
    GROUP BY Col1
    ORDER BY Col1

  45. I need your help ragarding sql query.

    Val_1, Val_2, Val_3,Val_4
    a b c 1
    a b c 2
    a b c 4
    m n o 1
    m n o 2

    and output should be

    val_1, Val_2,val_3, val_4
    a b c 1,2,4
    m n o 1,2

    Pls help if anyone know…

  46. Dear all i need a help from you people. I am using vb6 & ms sql server2000. I have two tables.

    1- Order (master table) having data

    ORDERID DATE EMPNAME
    1 1-2-1999 JOHN
    2 1-1-2010 JOHN
    3 1-12-2011 FOAD

    2-Orderdetail (child table) having data

    PRID PRNAME SP QTY TOTAL ORDERID
    11 CD 5 10 50 1
    11 CD 5 5 25 1
    12 DVD 10 5 50 2

    i wan to get data like this

    Empname Prname Qty SP total
    JOHN CD 15 5 75
    DVD 5 10 50

    Regards

  47. hi pinal,
    i find your blog very useful.
    i have been working for a airticketing service i have a table with 2 cols id and wsdate(datetime datatype) where wsdate contains date and time which got some thousands of records on different dates,i want to display distinct date col and other one is count of rows from the table on that particular date.please try to help me out

  48. Thank u sir, I forgot Distinct Clause, to remove duplicate values…. At the right time your answer helps me…alot..

  49. Pingback: SQL SERVER – Weekly Series – Memory Lane – #022 | SQL Server Journey with SQL Authority

  50. Hi….
    I am facing one problem that when I fire above query it will give me multiple records for single ItemNo but I wanna only unique records. What I need to do for this?

    Select distinct tg.ItemNo,i.TaxIncExc,case WHEN g.TaxFrom=’Inside Maharashtra’
    THEN i.TaxInsideMaha Else i.TaxOutsideMaha END as TaxPercent,i.ItemName,
    ((tg.Qty*i.mrp)*(ISNULL(i.Excise,0)))/100 as Excise,tg.Price,’0′ as TAxAmt,tp.ItemQty,
    tg.Qty as AcceptedQty,tp.BalanceQty,” as ExpiryDate,tg.RejectedQty,
    (tg.Qty*i.mrp) as TotalPrice from trn_grn tg inner join mst_item i
    on i.itemno=tg.itemno inner join mst_grn g on g.grnno=tg.grnno inner join mst_employee e
    on e.empno=tg.inspector inner join mst_po p inner join Trn_PO tp on tp.PONo=p.PONo
    on p.pono=g.pono where g.potype=’Normal_PO’and g.pono=27

  51. Hello Pinal, I never opine but I constantly read your posts. Your contributions to the SQLers are priceless, I just wanted to take the time to say thank you!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s