SQL SERVER – Difference between DISTINCT and GROUP BY – Distinct vs Group By

This question is asked many times to me. What is difference between DISTINCT and GROUP BY?

A DISTINCT and GROUP BY usually generate the same query plan, so performance should be the same across both query constructs. GROUP BY should be used to apply aggregate operators to each group. If all you need is to remove duplicates then use DISTINCT. If you are using sub-queries execution plan for that query varies so in that case you need to check the execution plan before making decision of which is faster.

Example of DISTINCT:
SELECT DISTINCT Employee, Rank
FROM Employees

Example of GROUP BY:
SELECT Employee, Rank
FROM Employees
GROUP BY Employee, Rank

Example of GROUP BY with aggregate function:
SELECT Employee, Rank, COUNT(*) EmployeeCount
FROM Employees
GROUP BY Employee, Rank

Reference : Pinal Dave (https://blog.sqlauthority.com)

SQL Function, SQL Joins, SQL Scripts
Previous Post
SQL SERVER – Fix : Error 8101 An explicit value for the identity column in table can only be specified when a column list is used and IDENTITY_INSERT is ON
Next Post
SQL SERVER – Index Seek Vs. Index Scan (Table Scan)

Related Posts

97 Comments. Leave new

  • Very simple and clear explanation, very easy to understand

    Reply
  • hay can u tell that how can i found average of colume valu in crystal reprot……not use cross tab only by queryes…. of many

    Reply
  • hey…
    i have a performance issue with me.
    in one query and corresponding sub queries they have used so many distinct clauses . so suggest me the best replacement
    to the distinct to improve the performance.

    Reply
  • I have view reteriving records from multiple tables using join.. I want DISTINCT records but for speific field in Jet SQL you can do it by DISTINCT ( Employee.EmployeeId), …How in SQL server… thanks

    Reply
  • Hi Pinal

    I m visit this site for long period really i felt this site is very nice and explanation is very good.

    Reply
  • SELECT DISTINCT Purchase.SessionId, UserInfo.UserId, UserInfo.UserName, Purchase.grandTotal, Purchase.PurchaseDate, Purchase.Status
    FROM UserInfo INNER JOIN Purchase ON UserInfo.UserId=Purchase.Uid;

    but this doesn’t generate correct result ……………… plz reply whats problem with that

    Reply
  • Hello Pinal:

    There is a table t_test with columns
    [id, name, code, description].

    I can write a query using MySQL as

    SELECT
    *
    FROM t_test
    GROUP BY Code

    What would be equivalent sql server query for this.

    Thanks.

    Reply
    • Note that it works in Mysql. In SQL Server you should always specify the column names in the SELECT statement

      It should be

      SELECT code FROM t_test
      GROUP BY code

      Reply
  • hh, I can’t really say what the query should be without knowing what your expected result is.

    Having a sample of a few typical data rows from the UserInfo and Purchase tables might also be neccessary.

    However, “SELECT DISTINCT” performs a hash match on the selected fields, and omits any records producing a duplicate hash. In other words, the query you used will return any record where the returned column values, taken together, are different from any other record in the result set.

    In general, when using “distinct”, more columns selected = more rows returned. If you need to select all of these columns into the result set, but are only concerned that a few of them be unique, then you need a different construct. Two possibilities are:

    1. Use “GROUP BY” to eliminate duplicates on the desired columns, and then apply aggregate functions (MIN, MAX, AVG, etc) to the remaining columns.

    2. Use subqueries or self-joins to limit the records if you need selected columns to be unique and don’t want to aggregate the other columns.

    Reply
  • I have an issue with both DISTINCT and GROUP BY.

    The issue is that using either one, the results comes back ordered as if using order by.

    I need the unique results in the order they are in the database. How can I do this? Thanks

    Reply
    • Note that you should not rely upon the natual ordering of the SQL Server. If you use versions earlier than 2000, it defaults the result by ORDER. But it is not gauranteed that it will work always. In Versions 2005 and above, there is no default order when Distinct is used

      Reply
  • i have a question regarding search criteria

    Select lastname,firstname from TblResourceProfile
    where @crit like in(lastname,firstname)
    dont know where to put the wild card ‘%’
    i

    Reply
    • samba shiva reddy
      February 27, 2010 7:11 pm

      Select lastname,firstname from TblResourceProfile
      where (lastname like ‘%@crit%’) or (firstname like ‘%@crit%’)

      Reply
  • thanks by the way….;)

    Reply
  • Select lastname,firstname from TblResourceProfile
    where @crit like in(lastname,firstname)
    dont know where to put the wild card ‘%’
    can this be possible?

    Reply
    • It should be

      Select lastname,firstname from TblResourceProfile
      where @crit like lastname+’%’ or @crit like firstname+’%’

      Reply
  • There is a third approach if you have a master and detail table – WHERE EXISTS. For example, if you have an OrderHeader table and an OrderItem table that have a one to many relationship on OrderHeaderId, you can decide unique orders in OrderItem one of 2 ways

    SELECT DISTINCT orderheaderid from orderitem

    SELECT orderheaderid FROM orderheader oh WHERE EXISTS (SELECT 1 FROM OrderItem WHERE OrderHeaderId=oh.OrderHeaderId)

    If these queries are run together, the “DISTINCT” query takes about 2/3 total query time, with WHERE EXISTS taken up a 1/3.

    (Ahh but why dont you just select from OrderHeader in this case. Well that would not omit orders that have no item records – if thats how you represent cancelled orders)

    In production code, always try to write SQL queries that produce unique results WITHOUT DISTINCT clauses. If performance is what you are after and you have one to many table relationships, use WHERE EXISTS.

    We have published an article of SQL Performance Do’s and Dont’s if thats of help to anyone.

    Steve Hatchard
    Director
    Mattched IT Ltd

    Reply
  • Is there a way to execute a DISTINCT query on a single field while the SELECT statement contains more fields. For example this statement:

    SELECT DISTINCT EmployeeID, EmployeeName, ContractDate, CurrentTitlle FROM EmployeeTable

    I am only interested to fetch DISTINCT reocrds based on the EmployeeID. Please note that more than one occurence of each employee’s record exists in this table. Therefore getting DISTINCT on all fields is not possible.

    If DISTINCT cannot solve this problem, what is other best approach in achieve this?

    Tariq Changgez

    Reply
  • Your best answer is to have two tables – one for unique employee data EmployeeId, EmployeeName, CurrentTitle etc.. and one for multiple row data – EmployeeId, ContractDate. It is usually best practice to “normalise” data like this.

    This off topic and not really an issue for DISTINCT.

    Steve Hatchard
    Director
    Mattched IT Ltd

    Reply
  • When request is very simple, Distinct can be used else always use Group By. really faster.

    Reply
  • hi pinal,
    i’m struck with an issue related to distinct clause.
    i’ve fired a query in which i’ve used distict clause. As per that i’m getting distinct rows but they are repeatative in nature.
    e.g. if i’m getting 3 distinct rows from my query; i’m getting 1st row many hundreds of time, followed by 2nd and 3rd row in the same manner. How to suppress these repeated rows and how to only three distinct rows as a output?
    Thanx in advance.

    Reply
  • i like pinal dave blogs

    Reply
  • Hi Pinal,

    I have a situation, I want to count the amount of time a similar name occurs in a table.

    E.g.

    Table Buses,

    Starmart
    Starmart
    Starmart
    Starmart
    Starmart
    Damai
    Damai
    Damai
    Damai
    Damai
    Damai
    Damai
    Damai
    Senibudaya
    Senibudaya
    Senibudaya
    Senibudaya
    Senibudaya
    Senibudaya
    Senibudaya
    Senibudaya

    How do i actually type out a sql script to group and return the count for each name.

    This is what i used so far
    select CurrentOperator, COUNT(*) CurrentOperator from EZBusDB.dbo.Buses
    GROUP BY CurrentOperator,

    I am getting this error:
    Msg 102, Level 15, State 1, Line 2
    Incorrect syntax near ‘,’.

    How i go about it?

    Thank you!

    Reply
  • “select CurrentOperator, COUNT(*) CurrentOperator from EZBusDB.dbo.Buses
    GROUP BY CurrentOperator”

    Delete Comma(,) At the End of Your Command

    Reply

Leave a Reply