How to delete duplicate rows from an Oracle Database?

Question

We have a table that has had the same data inserted into it twice by accident meaning most (but not all) rows appears twice in the table. Simply put, I'd like an SQL statement to delete one version of a row while keeping the other; I don't mind which version is deleted as they're identical.

Table structure is something like:

FID, unique_ID, COL3, COL4....

Unique_ID is the primary key, meaning each one appears only once. FID is a key that is unique to each feature, so if it appears more than once then the duplicates should be deleted.

To select features that have duplicates would be:

select count(*) from TABLE GROUP by FID

Unfortunately I can't figure out how to go from that to a SQL delete statement that will delete extraneous rows leaving only one of each.

This sort of question has been asked before, and I've tried the create table with distinct, but how do I get all columns without naming them? This only gets the single column FID and itemising all the columns to keep gives an: ORA-00936: missing expression

CREATE TABLE secondtable NOLOGGING as select distinct FID from TABLE

After you get the solution, I seriously recommend you to have your database normalized, as duplicate records (so-called redundancy) is very likely a result of not-normalized database. — Kushal
– Kushal, Commented May 10, 2012 at 15:36
@Kush - we can't normalise the database because lots of different applications use it. This is a result of someone loading the data twice back in 2007 but we don't use this much so didn't notice before. Thanks for the suggestion though — GIS-Jonathan
– GIS-Jonathan, Commented May 10, 2012 at 15:42

Justin Cave · Accepted Answer · 2012-05-10 15:38:24Z

10

If you don't care which row is retained

DELETE FROM your_table_name a
 WHERE EXISTS( SELECT 1
                 FROM your_table_name b
                WHERE a.fid = b.fid
                  AND a.unique_id < b.unique_id )

Once that's done, you'll want to add a constraint to the table that ensures that FID is unique.

answered May 10, 2012 at 15:38

Justin Cave

233k25 gold badges378 silver badges395 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Zesty Over a year ago

This won't work if all values in both records are the same.

Justin Cave Over a year ago

@Zesty - Very true. But the question specifically mentions that there is a unique_id column that is unique.

bitoshi.n · Accepted Answer · 2012-05-10 15:38:57Z

4

Try this

DELETE FROM table_name A WHERE ROWID > (
SELECT min(rowid) FROM table_name B
WHERE A.FID = B.FID)

answered May 10, 2012 at 15:38

bitoshi.n

2,3381 gold badge16 silver badges16 bronze badges

1 Comment

GIS-Jonathan Over a year ago

In theory this is the same as Justin Cave's, however yours didn't work as it gave me a data-type error (despite my rowid being an integer).

hogni89 · Accepted Answer · 2012-05-10 15:38:21Z

1

A suggestion

DELETE FROM x WHERE ROWID IN
(WITH y AS (SELECT xCOL, MIN(ROWID) FROM x GROUP BY xCOL HAVING COUNT(xCOL) > 1)
SELCT a.ROWID FROM x, y WHERE x.XCOL=y.XCOL and x.ROWIDy.ROWID)

answered May 10, 2012 at 15:38

hogni89

1,7196 gold badges22 silver badges41 bronze badges

Comments

GongchuangSu · Accepted Answer · 2017-11-21 07:21:00Z

1

You can try this.

delete from tablename a
where a.logid, a.pointid, a.routeid) in (select logid, pointid, routeid from tablename 
group by logid, pointid, routeid having count(*) > 1)
and rowid not in (select min(rowid) from tablename
group by logid, pointid, routeid having count(*) > 1)

answered Nov 21, 2017 at 7:21

GongchuangSu

292 bronze badges

1 Comment

Neil Over a year ago

It is considered good etiquette to include a brief explanation as to what your code does.

Christian Bongiorno · Accepted Answer · 2024-03-05 23:54:38Z

1

I know this is an old question, I came up with a different solution entirely:

partition them by what makes them duplicate and then use that as a row number to throw away the extras:

delete from MY_TABLE where unique_id in (
    select unique_id
    from (
        select mt.*,
           row_number() over (
               partition by mt.RAW_VALUE,mt.END_DATE_TRUNC_UTC
               order by mt.END_DATE_TRUNC_UTC
           ) rn
        FROM
        mytable mt
    ) where rn > 1
);

So, in my case above, the fields that defined data as duplicate were the RAW_VALUE and the END_DATE_TRUNC_UTC This gave duplicate rows unique row numbers within their group. Then, I could just throw away everything except rn = 1

answered Mar 5, 2024 at 23:54

Christian Bongiorno

5,7614 gold badges48 silver badges98 bronze badges

Comments

Shahriar Hasan Sayeed · Accepted Answer · 2012-05-10 19:37:38Z

0

Try with this.

DELETE FROM firsttable WHERE unique_ID NOT IN 
(SELECT MAX(unique_ID) FROM firsttable GROUP BY FID)

EDIT: One explanation:

SELECT MAX(unique_ID) FROM firsttable GROUP BY FID;

This sql statement will pick each maximum unique_ID row from each duplicate rows group. And delete statement will keep these maximum unique_ID rows and delete other rows of each duplicate group.

edited May 10, 2012 at 19:37

answered May 10, 2012 at 19:14

Shahriar Hasan Sayeed

5,9751 gold badge19 silver badges12 bronze badges

Collectives™ on Stack Overflow

How to delete duplicate rows from an Oracle Database?

6 Answers 6

2 Comments

1 Comment

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

1 Comment

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related