How to remove duplicates from table using SQL query

Question

I have a table which is as follows:

emp_name   emp_address  sex  matial_status  
uuuu       eee          m    s
iiii       iii          f    s
uuuu       eee          m    s

I want to remove the duplicate entries based on 3 fields emp_name, emp_address and sex. and my resultant table (after removing the duplicates) should look like -

emp_name    emp_address   sex   marital_status
uuuu        eee           m     s
iiii        iii           f     s

I am not able to recall how to write a SQL Query for this. an anyone pls help?

If you're not going to base duplication on all the columns of the row, then when a duplicate is found, how will you decide which row to keep? — Ralph Shillington
– Ralph Shillington, Commented Oct 6, 2011 at 14:52

Kusalananda · Accepted Answer · 2011-10-06 14:54:51Z

5

I would create a new table with a unique index over the columns that you want to keep unique. Then do an insert from the old table into the new, ignoring the warnings about duplicated rows. Lastly, I would drop (or rename) the old table and replace it with the new table. In MySQL, this would look like

CREATE TABLE tmp LIKE mytable;
ALTER TABLE tmp ADD UNIQUE INDEX myindex (emp_name, emp_address, sex, marital_status);
INSERT IGNORE INTO tmp SELECT * FROM mytable;
DROP TABLE mytable;
RENAME TABLE tmp TO mytable;

Or something similar (this is totally untested).

answered Oct 6, 2011 at 14:54

Kusalananda

15.8k3 gold badges47 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mikael Eriksson · Accepted Answer · 2011-10-06 22:21:44Z

4

This is not a query but a delete statement. It will delete/remove duplicate rows from your table

;with C as
(
  select row_number() over(partition by DUPLICATE_VAARS_DECISION 
                           order by NODE_EQ_NO) as rn
  from yourtable
)
delete C
where rn > 1

If you are only interested in querying the table and get the non duplicates as a result you should use this instead.

;with C as
(
  select *,
         row_number() over(partition by DUPLICATE_VAARS_DECISION 
                           order by NODE_EQ_NO) as rn
  from yourtable
)
select *
from C
where rn = 1

edited Oct 6, 2011 at 22:21

answered Oct 6, 2011 at 22:02

Mikael Eriksson

139k22 gold badges223 silver badges293 bronze badges

1 Comment

6dev6il6 Over a year ago

Thanks this works! For the first statement to delete duplicates, it's more understandable like this: ;with C as ( select row_number() over(partition by Description order by Description) as rn from [YourTable] ) delete C where rn > 1

Roopesh Shenoy · Accepted Answer · 2011-10-06 14:59:44Z

2

It looks like all four column values are duplicated so you can do this -

select distinct emp_name, emp_address, sex, marital_status
from YourTable

However if marital status can be different and you have some other column based on which to choose (for eg you want latest record based on a column create_date) you can do this

select emp_name, emp_address, sex, marital_status
from YourTable a
where not exists (select 1 
                   from YourTable b
                  where b.emp_name = a.emp_name and
                        b.emp_address = a.emp_address and
                        b.sex = a.sex and
                        b.create_date >= a.create_date)

answered Oct 6, 2011 at 14:59

Roopesh Shenoy

3,4271 gold badge36 silver badges50 bronze badges

1 Comment

Mack Over a year ago

This doesnt answer his question imo. He wants a UPDATE or DELETE FROM statement, not a single SELECT statement that is not permanent and does not alter the table in any way.

SQLMenace · Accepted Answer · 2011-10-06 14:53:14Z

2

one way

select emp_name,   emp_address,  sex,  max(marital_status) as marital_status
from Yourtable
group by emp_name,   emp_address,  sex

Since I don't know what you want, I used max for the marital status

See also Including an Aggregated Column's Related Values for more examples

answered Oct 6, 2011 at 14:53

SQLMenace

136k25 gold badges212 silver badges227 bronze badges

2 Comments

user7 Over a year ago

why you have used max function??

mellamokb Over a year ago

See @Ralph's comment on your question. What is your logic for determining which duplicate marital_status to keep?

Community · Accepted Answer · 2017-05-23 10:34:12Z

0

If you are okay with trading space for performance and simplicity then the duplicates in emp_name | emp_address | sex combo can be eliminated, by the introduction of a calculated/derived column using CHECKSUM() TSQL method and DISTINCT keyword while querying.

Heres an example of CHECKSUM :

SELECT CHECKSUM(*) FROM HumanResources.Employee WHERE EmployeeID = 2

Google around and create a dependent column that contains the checksum of the 3 columns. Then you can select distinct rows by looking at this question

edited May 23, 2017 at 10:34

CommunityBot

11 silver badge

answered Oct 6, 2011 at 15:06

Zasz

12.6k9 gold badges46 silver badges65 bronze badges

1 Comment

Zasz Over a year ago

I also invite some critiques on this answer - I need to know if this is good enough (even for a table with 800k rows)

Tank Liu · Accepted Answer · 2015-11-11 04:42:09Z

The best answer is here:
Use this SQL statement to identify the extra duplicated rows:

 select * from Employee a 

    where %%physloc%% > 

        (select min(%%physloc%%) from Employee b 

            where a.emp_name=b.emp_name and a.emp_address=b.emp_address and a.sex=b.sex);

you will get the extra row:

uuuu   eee m   s

Use this SQL statement to delete the extra duplicated rows:

 delete from Employee a 

    where %%physloc%% > 

        (select min(%%physloc%%) from Employee b 

            where a.emp_name=b.emp_name and a.emp_address=b.emp_address and a.sex=b.sex);

For all duplicated records, only the one with lowest physical location is kept. This method can be applied to remove all kinds of duplicated rows.

I am assuming that you use MS SQL Server. If you are using Oracle DB, then you can just replace '%%physloc%%' with 'rowid'

Enjoy the code!

Shahadat Hossain Khan · Accepted Answer · 2015-12-08 03:44:22Z

0

I know this is old post, but recently I tested a solution and want to share if any one can find my solution helpful -

CREATE TABLE tmpTable LIKE yourTable; insert into tmpTable (col1, col2 ... colN) SELECT distinct col1, col2 ... colN FROM yourTable WHERE 1; drop table yourTable; RENAME TABLE tmpTable TO yourTable;

Please note, insert into statement may execute without primary key.

Thanks.

answered Dec 8, 2015 at 3:44

Shahadat Hossain Khan

73710 silver badges25 bronze badges

Comments

Chameera W. Ashan · Accepted Answer · 2021-01-21 12:10:42Z

0

If you don't satisfied with distinct try below

SELECT MAX(ID) AS MaxRecordID, max(FirstName) AS fname
    FROM [SampleDB].[dbo].[Employee]
    GROUP BY [FirstName], 
             [LastName], 
             [Country]

Use the Max key word with groupBy. You can use max for any type column. Integer, Varchar and etc.

answered Jan 21, 2021 at 12:10

Chameera W. Ashan

3804 silver badges7 bronze badges

Collectives™ on Stack Overflow

How to remove duplicates from table using SQL query

8 Answers 8

Comments

1 Comment

1 Comment

2 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

Comments

1 Comment

1 Comment

2 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related