8

We have a table that has had the same data inserted into it twice by accident meaning most (but not all) rows appears twice in the table. Simply put, I'd like an SQL statement to delete one version of a row while keeping the other; I don't mind which version is deleted as they're identical.

Table structure is something like:

FID, unique_ID, COL3, COL4....

Unique_ID is the primary key, meaning each one appears only once. FID is a key that is unique to each feature, so if it appears more than once then the duplicates should be deleted.

To select features that have duplicates would be:

select count(*) from TABLE GROUP by FID

Unfortunately I can't figure out how to go from that to a SQL delete statement that will delete extraneous rows leaving only one of each.

This sort of question has been asked before, and I've tried the create table with distinct, but how do I get all columns without naming them? This only gets the single column FID and itemising all the columns to keep gives an: ORA-00936: missing expression

CREATE TABLE secondtable NOLOGGING as select distinct FID from TABLE

2
  • After you get the solution, I seriously recommend you to have your database normalized, as duplicate records (so-called redundancy) is very likely a result of not-normalized database. Commented May 10, 2012 at 15:36
  • @Kush - we can't normalise the database because lots of different applications use it. This is a result of someone loading the data twice back in 2007 but we don't use this much so didn't notice before. Thanks for the suggestion though Commented May 10, 2012 at 15:42

6 Answers 6

10

If you don't care which row is retained

DELETE FROM your_table_name a
 WHERE EXISTS( SELECT 1
                 FROM your_table_name b
                WHERE a.fid = b.fid
                  AND a.unique_id < b.unique_id )

Once that's done, you'll want to add a constraint to the table that ensures that FID is unique.

Sign up to request clarification or add additional context in comments.

2 Comments

This won't work if all values in both records are the same.
@Zesty - Very true. But the question specifically mentions that there is a unique_id column that is unique.
4

Try this

DELETE FROM table_name A WHERE ROWID > (
SELECT min(rowid) FROM table_name B
WHERE A.FID = B.FID)

1 Comment

In theory this is the same as Justin Cave's, however yours didn't work as it gave me a data-type error (despite my rowid being an integer).
1

A suggestion

DELETE FROM x WHERE ROWID IN
(WITH y AS (SELECT xCOL, MIN(ROWID) FROM x GROUP BY xCOL HAVING COUNT(xCOL) > 1)
SELCT a.ROWID FROM x, y WHERE x.XCOL=y.XCOL and x.ROWIDy.ROWID)

Comments

1

You can try this.

delete from tablename a
where a.logid, a.pointid, a.routeid) in (select logid, pointid, routeid from tablename 
group by logid, pointid, routeid having count(*) > 1)
and rowid not in (select min(rowid) from tablename
group by logid, pointid, routeid having count(*) > 1)

1 Comment

It is considered good etiquette to include a brief explanation as to what your code does.
1

I know this is an old question, I came up with a different solution entirely:

partition them by what makes them duplicate and then use that as a row number to throw away the extras:

delete from MY_TABLE where unique_id in (
    select unique_id
    from (
        select mt.*,
           row_number() over (
               partition by mt.RAW_VALUE,mt.END_DATE_TRUNC_UTC
               order by mt.END_DATE_TRUNC_UTC
           ) rn
        FROM
        mytable mt
    ) where rn > 1
);

So, in my case above, the fields that defined data as duplicate were the RAW_VALUE and the END_DATE_TRUNC_UTC This gave duplicate rows unique row numbers within their group. Then, I could just throw away everything except rn = 1

Comments

0

Try with this.

DELETE FROM firsttable WHERE unique_ID NOT IN 
(SELECT MAX(unique_ID) FROM firsttable GROUP BY FID)

EDIT: One explanation:

SELECT MAX(unique_ID) FROM firsttable GROUP BY FID;

This sql statement will pick each maximum unique_ID row from each duplicate rows group. And delete statement will keep these maximum unique_ID rows and delete other rows of each duplicate group.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.