Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.
DELETE
FROM MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
Watch the view to see the above concept in action:
[youtube=http://www.youtube.com/watch?v=ioDJ0xVOHDY]
Reference : Pinal Dave (https://blog.sqlauthority.com)
450 Comments. Leave new
Awesome man. I have used you delete duplicate records query. And it works great.
Thanks
Unfortunately, this query doesn’t seem to work when attempting to delete a row that has duplicate id’s but different data.
Can’t believe how hard it is to find something simple that will work.
Post some sample data with expected result so that you will get the proper query that would work in your case
Why don’e we do a self union to the table.
select * from mytable union select * from mytable,
thsi query returns the exact records, and i guess we can use this data to build a new table or replacing the previous one.
Thoughts on this please
Note that this is equivalent to using
select * from mytable
I am trying to design an query which functions like ICETOOL of mainframe it also requires this kind of duplicate deletions…
ICETOOL in my application (sorts, merges and separates) duplicates and uniques from 4 input file into two output files(one containing unique ids and other containing its duplicate ids with different data) and discards the duplicates which occurs more than twice…
i have the 4 files as tables now i need to get two output tables
help will be much appreciated………
Hi
Sir can u please let me how to pass aliase name in Stored
procedure when I am using table as Parameter in A SP
This is very bad design
You need to make use of dynamic SQL
Refer this
http://www.sommarskog.se/dynamic_sql.html
How would your solution work if I had a single table called Customers that looks like this:
ID LastName FirstName
1 Smith John Adam
2 Smith John A.
3 Jones Jane Mary
4 Jones Jane M.
No other fields are in this table. Just what you see. The ID field is unique for each record. In this example, both records for Smith are the same person. The same goes for Jones. I want to view in a table the resulting unique records (i.e. only one record for John Smith, and one record for Jane Jones).
I am a complete newbie and would appreciate guidance.
Thanks!
You can use any one of the methods specified here depending on the version you are using
If you have Key already, you have seen many solutions in this post before, you need to group by on the repeating/columns that can be duplicated and delete them. The answered solution is if you dont have a unique key in the table.
well to add more to that, once you do self union and insert into temp table, you can delete every thing form main table and insert every thing from temp table in to the original/main table. I think this should help..
-Aditya
SELECT id,name,mNumber,sNumber,partOfSpeech,type FROM keyword where id IN (Select max(id) from keyword GROUP BY name)
The above is if you want to just select a column without duplicates in it.
create table T1 (C1 int, C2 varchar(3))
–Fill it with duplicate Records
declare @as Table (c1 int,c2 varchar(3))
insert into @as select Distinct * from T1
Truncate table T1
Alter Table T1 Disable Trigger All
insert into T1 select * from @As
Alter Table T1 Enable Trigger All
Thanks Mr. Dave for the codes.
I’ve searched inside out in the internet but still couldn’t find a solution for my problem.
Your code manage to solved it, your truly a master in SQL. Thanks again
Can you show how to make this a select Into, so that one can save the deleted records in another table
DELETE
FROM MyTable
WHERE ID NOT IN
(SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicatevalueColumn1, DuplicateValueColumn2, DuplicateValueColumn2
select * into newtable
FROM MyTable
WHERE ID NOT IN
(SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicatevalueColumn1, DuplicateValueColumn2, DuplicateValueColumn3
I thing this one alos workout
====================
set rowcount 1
delete test
from test a
where(select count(*) from test b where b.age = a.age and b.name = a.name)
> 1
while @@rowcount>0
delete test
from test a
where(select count(*) from test b where b.age = a.age and b.name = a.name)
>1
set rowcount 0
Note that for large number of datasets it will take long time to finish the execution. You should stay away from using this method
//Here n is maximum allowable number of a record
create or replace procedure t_dili1_2
(n in int,q in out sys_refcursor)
as
begin
open q for
select a,b
from dil1 where (a,b) in(
select a,b from (select a,b,count(*) from t group by a,b having count(*)<n));
end t_dili1_2;
/
variable z refcursor
exec t_dili1_2(4,:z)
print z
Plz Reffer to my (above) comment (PL/SQL Block)..
Here dil1 is name of TABLE.
a,b are columns with repeated records.
I did is Using REFCURSOR.
Note that this site is for MS SQL Server
For ORACLE, post at http://www.orafaq.com
Q: Display all the records of a table t (where there are several duplicated & reduplicated records present) with restricted number (n) of distinct records. (Use REFCURSOR)
Solution:
create or replace procedure t_dili1_2
(n in int,q in out sys_refcursor)
as
begin
open q for
select a,b
from t where (a,b) in(
select a,b from (select a,b,count(*) from t group by a,b having count(*)<n));
end t_dilip1_2;
/
variable z refcursor
exec t_dili1_2(4,:z)
print z
Pinal,
You are truly a SQL guru and I love your website and your articles.
The statement above that deletes duplicate records worked perfectly for me.
Thanks.
Hello All,
I have a situation where I have two tables A & B. A has the actual data, table B maintains a sequence of data that is in table A. I have a process that polls table A every 5 seconds. Whenever a record is updated in table A with a TRANSID n, table B is updated with the same TRANSID. This said, what I would like to achieve is, when I poll table A, I should get all the records that were updated in the past 5 seconds. The catch is, same record can be updated more than once and end up updated with a greater TRANSID, table B is updated with the same TRANSID, but if same record is updated, one of their columns REPORT_ID, would be the same in table A(REPORT_ID not in table B). I should be able to get the most recent of the duplicate records and other records that were updated. Any help is greatly appreciated since writing SQL queries is not one of my fortes.
YOu rock man!
how to find second highest salary of the employee table
please explain
There are many number of methods available
Choose the effecient method
Refer this pos
dear Pinaldave
u r genius
u have solved my problems on deleting duplicate records