How do I delete duplicate data from SQL table

Question

I am in the midst of uploading and updating my db from data from a third party source. Unfortunately, there are many duplicate records in the data from the third party data source.

I looked at a few questions here on SO but all of them seem to be cases where there is an ID column which differentiates one row from the other.

In my case, there is no ID column. e.g.

State   City    SubDiv  Pincode Locality Lat    Long
Orissa  Koraput Jeypore 764001  B.D.Pur 18.7743 82.5693
Orissa  Koraput Jeypore 764001  Jeypore 18.7743 82.5693
Orissa  Koraput Jeypore 764001  Jeypore 18.7743 82.5693
Orissa  Koraput Jeypore 764001  Jeypore 18.7743 82.5693
Orissa  Koraput Jeypore 764001  Jeypore 18.7743 82.5693

Is there a simple query which I can run to delete all duplicate records and keep one record as the original? So in the above case I want to delete rows 3,4,5 from the table.

I am not sure if this can be done using simple sql statements but would like to know others opinion how this can be done

Could you not just add an ID column to your table, then use one of the methods you've already read about? Also, it may be worth looking into not importing the duplicates from the other data source, if you don't want them in your table. — Anthony Grist
– Anthony Grist, Commented Sep 22, 2011 at 11:41

t-clausen.dk · Accepted Answer · 2011-09-22 11:55:27Z

7

;with cte as(
select State City, SubDiv, Pincode, Locality, Lat, Long, 
row_number() over (partition by City, SubDiv, Pincode, Locality, Lat,Long order by City) rn
from yourtable
)
delete cte where rn > 1

edited Sep 22, 2011 at 11:55

answered Sep 22, 2011 at 11:41

t-clausen.dk

44.5k12 gold badges61 silver badges92 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Stacey Richards · Accepted Answer · 2011-09-22 11:42:49Z

5

I would insert the third party data to a temporary table that then:

insert into
  target_table
select distinct
  *
from
  temporary_table

and finally delete the temporary table.

Only distinct (unique) rows will be inserted to the target table.

answered Sep 22, 2011 at 11:42

Stacey Richards

6,6127 gold badges40 silver badges40 bronze badges

2 Comments

Aaron Digulla Over a year ago

+1 Another approach is to copy the data into the temp table with DISTINCT. I like this approach better because it gives you a chance to validate the result of the operation; delete cte where rn > 1 should also work but if you make a mistake, you already destroyed data.

HLGEM Over a year ago

Just wanted to add, that this solution works just fine (as does the solution proposed by @t-clausen.dk) but does nothing to prevent this from happening again. After dedupping you need to put a unique index on the natural key of your data. You may also need to fix your import process.

gbn · Accepted Answer · 2011-09-22 11:42:29Z

3

One of

add a column to de-duplicate and leave it
do a SELECT DISTINCT * INTO ANewTable FROM OldTable and then rename etc
Use t-clausen.dk's CTE approach

And then add a unique index on the desired columns

answered Sep 22, 2011 at 11:42

gbn

434k84 gold badges602 silver badges690 bronze badges

Comments

IUnknown · Accepted Answer · 2011-09-22 11:41:11Z

2

You may use the ROW_NUMBER() function : SQL SERVER – 2005 – 2008 – Delete Duplicate Rows

answered Sep 22, 2011 at 11:41

IUnknown

22.5k4 gold badges41 silver badges71 bronze badges

Comments

marc_s · Accepted Answer · 2011-09-22 13:20:34Z

0

Try this

alter table mytable add id int identity(1,1)

delete  mytable  where id in (
select duplicateid from (select ROW_NUMBER() over (partition by State ,City ,SubDiv ,Pincode ,Locality ,Lat ,Long order by State ,City ,SubDiv ,Pincode ,Locality ,Lat ,Long ) duplicateid
from mytable) t where duplicateid !=1)

alter table mytable drop column id

edited Sep 22, 2011 at 13:20

marc_s

760k186 gold badges1.4k silver badges1.5k bronze badges

answered Sep 22, 2011 at 12:06

Sachin Patil

232 silver badges6 bronze badges

2 Comments

Aaron Bertrand Over a year ago

id in (select *? What value does the identity column add? This is close but see @t-clausen.dk's answer - no identity column needed.

marc_s Over a year ago

If you post code, XML or data samples, please highlight those lines in the text editor and click on the "code samples" button ( { } ) on the editor toolbar to nicely format and syntax highlight it!

Collectives™ on Stack Overflow

How do I delete duplicate data from SQL table

5 Answers 5

Comments

2 Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

2 Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related