Oracle SQL : How to delete Duplicate Rows

Question

Implemented removal of duplicate rows as per https://www.oracletutorial.com/advanced-oracle-sql/how-to-delete-duplicate-records-in-oracle/.

However, my situation needs further work. Let's assume that my Table looks like this :

CREATE TABLE fruits
(
    fruit_id   NUMBER generated BY DEFAULT AS IDENTITY,
    fruit_name VARCHAR2(100),
    color      VARCHAR2(20),
    status     varchar2(10),
    PRIMARY KEY (fruit_id)
);

INSERT  INTO fruits(fruit_name, color, status) VALUES ('Apple', 'Red', 'INITIAL');
INSERT  INTO fruits(fruit_name, color, status)  VALUES ('Apple', 'Red', 'INITIAL');
INSERT  INTO fruits(fruit_name, color, status) VALUES ('Orange', 'Orange', 'COMPLETE');
INSERT  INTO fruits(fruit_name, color, status) VALUES ('Orange', 'Orange', 'INITIAL');
INSERT  INTO fruits(fruit_name, color, status) VALUES ('Orange', 'Orange', 'INITIAL');
INSERT  INTO fruits(fruit_name, color, status) VALUES ('Banana', 'Yellow', 'INITIAL');
INSERT  INTO fruits(fruit_name, color, status) VALUES ('Banana', 'Green', 'INITIAL');

DELETE
FROM fruits
WHERE fruit_id NOT IN
      (
          SELECT MAX(fruit_id)
          FROM fruits
          GROUP BY fruit_name,
                   color
      )
  AND STATUS = 'INITIAL';

After deleting the duplicates like above, I still find that one of the duplicate rows(fruit_id =5) still remains.

select * from fruits;

2,Apple,Red,INITIAL
3,Orange,Orange,COMPLETE
5,Orange,Orange,INITIAL
6,Banana,Yellow,INITIAL
7,Banana,Green,INITIAL

I would like to delete all duplicate rows that are in 'INITIAL' state.

How should I go about it ?

UPDATE

Just to be sure, the logic should be : All NON-MAX records in 'INITIAL' state should be deleted. Also, if a record with 'COMPLETE' status is present, then I'd like the duplicate 'INITIAL' record to be deleted as well. In my example, I'd like record with fruit_id = 5(with STATE='INITIAL') to be deleted since there's another record with fruit_id =3(with STATE='COMPLETE') which has the same value of "orange", "orange" but with 'COMPLETE' value.

In your scenario can there be more than one Completed with same fruit_nameand color? — PKey
– PKey, Commented Sep 2, 2020 at 10:25
I think the record with fruit_id = 2 should be deleted instead of the one with fruit_id = 1 as being non-max for own group of fruit_name and color. — Barbaros Özhan
– Barbaros Özhan, Commented Sep 2, 2020 at 10:43

GMB · Accepted Answer · 2020-09-02 10:32:47Z

1

I would use a correlated subquery. I think the logic you want is:

delete from fruits f
where status = 'INITIAL' and exists(
    select 1 
    from fruits f1 
    where 
        f1.fruit_name = f.fruit_name 
        and f1.color = f.color
        and (
            (f1.status = 'INITIAL' and f1.fruit_id > f.fruit_id)
            or (f1.status = 'COMPLETE' and f1.fruit_id <> f.fruit_id)
        )
)

This deletes rows whose status is initial and for which another row exists with the same name and color and either status initial and a greater id, or status complete.

Demo on DB Fiddle:

FRUIT_ID | FRUIT_NAME | COLOR  | STATUS  
-------: | :--------- | :----- | :-------
       2 | Apple      | Red    | INITIAL 
       3 | Orange     | Orange | COMPLETE
       6 | Banana     | Yellow | INITIAL 
       7 | Banana     | Green  | INITIAL

answered Sep 2, 2020 at 10:32

GMB

224k25 gold badges103 silver badges151 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

anjanb Over a year ago

HI @GMB, thanks. Obviously, this is not the exact table I have in my DB. My Table has about 20 columns and about 50K records. Except fruit_id(PK), none of the columns have an index. What should I do to make your DELETE NOT take enormous amount of time ? Also, our DB is Oracle 11 and not Oracle 18c

GMB Over a year ago

@anjanb: an index on (color, fruit_name, status) might be useful. 50k records is not that much, so I would not really expect performance issues.

Gordon Linoff · Accepted Answer · 2020-09-02 10:46:10Z

Let's start with the rows you want to keep:

select f.*
from (select f.*,
             sum(case when status = 'COMPLETE' then 1 else 0 end) over (partition by fruit_name, color) as num_complete,
             max(id) over (partition by fruit_name, color) as max_id
      from fruits f
     ) f
where status = 'COMPLETE' or
      (num_complete = 0 and id < max_id);

This is a good basis for doing the delete. One method:

delete fruits f
    where not exists (select 1
                      from (select f2.*,
                                   sum(case when status = 'COMPLETE' then 1 else 0 end) over (partition by fruit_name, color) as num_complete,
                                   max(id) over (partition by fruit_name, color) as max_id
                            from fruits f2
                           ) f2
                      where ( f.status = 'COMPLETE' or
                              (f.num_complete = 0 and f.id < f.max_id)
                            ) and
                            f.fruit_id = f2.fruit_id
                     );

If you are deleting a lot of rows in a large table, you might find it more efficient to recreate the table:

create table temp_fruits as
    select fruit_id, fruit_name, color, status
    from (select f.*,
                 sum(case when status = 'COMPLETE' then 1 else 0 end) over (partition by fruit_name, color) as num_complete,
                 max(id) over (partition by fruit_name, color) as max_id
          from fruits f
         ) f
    where status = 'COMPLETE' or
          (num_complete = 0 and id < max_id);

truncate table fruits;

insert into fruits (fruit_id, fruit_name, color, status)
     select * from temp_fruits;

Note that this changes the row id as well.

I originally misunderstood, thinking you wanted to delete the COMPLETE record as well:

delete fruits f
    where exists (select 1 
                  from fruits f2
                  where f2.fruit_name = f.fruit_name and
                        f2.color = f.color and
                        f2.status = 'COMPLETE'
                 ) or
          f.id < (select max(f2.id)
                 from fruits f2
                 where f2.fruit_name = f.fruit_name and
                       f2.color = f.color
                );

Barbaros Özhan · Accepted Answer · 2020-09-02 11:42:46Z

1

You can use ROW_NUMBER() analytic function

DELETE fruits
 WHERE fruit_id IN 
     ( WITH del AS 
      (
       SELECT f.*,
              ROW_NUMBER() OVER
              (PARTITION BY fruit_name, color 
                   ORDER BY CASE WHEN f.status = 'COMPLETE' THEN 0 ELSE fruit_id END) 
                      AS rn                            
         FROM fruits f
       )  
       SELECT fruit_id
         FROM del
        WHERE status = 'INITIAL'
          AND rn > 1
      )

where rn > 1 filters out records with non-max fruit_id values when grouped by fruit_name and color.

Demo

edited Sep 2, 2020 at 11:42

answered Sep 2, 2020 at 10:34

Barbaros Özhan

65.9k11 gold badges36 silver badges64 bronze badges

2 Comments

anjanb Over a year ago

Hi @Barbaros Özhan : Thanks. I tried this but modified the order in which "orange" records were inserted and now the query doesn't return what is expected. The Query retains record with fruit_id =3 which is a duplicate -- dbfiddle.uk/…. Could you fix that ? thanks

Barbaros Özhan Over a year ago

you're welcome @anjanb . I've just edited. Seems you need conditional logic for the records with status = 'COMPLETED'.

Koen Lostrie · Accepted Answer · 2020-09-02 10:15:34Z

0

You added an additional column so the logic needs to be modified. The NOT IN clause looks at all fruit in any status, you should limit that to only fruit in INITIAL status

DELETE
FROM fruits
WHERE fruit_id NOT IN
      (
          SELECT MAX(fruit_id)
          FROM fruits WHERE status = 'INITIAL'
          GROUP BY fruit_name,
                   color
      )
AND status = 'INITIAL';

answered Sep 2, 2020 at 10:15

Koen Lostrie

19.5k2 gold badges16 silver badges25 bronze badges

2 Comments

anjanb Over a year ago

Hi @Koen Lostrie : That doesn't work since fruit_id = 5 is the max. I'd like all NON-MAX records to be deleted. Also, if a record with 'COMPLETE' status is present, then I'd like the 'INITIAL' record with MAX value to be deleted as well. In my example, I'd like record with fruit_id = 5 to be deleted since there's another record with fruit_id =3 which has the same value of "orange", "orange" but with 'COMPLETE' value.

anjanb Over a year ago

still doesn't work. I didn't see any change. I didn't introduce any new column. Just clarified expectation.

Radagast81 · Accepted Answer · 2020-09-02 10:29:49Z

0

I think you can just do the following:

DELETE
FROM fruits f
WHERE STATUS = 'INITIAL'
  AND EXISTS (SELECT 1 FROM fruits
               WHERE fruit_name = f.fruit_name
                 AND color = f.color
                 AND (STATUS != f.STATUS OR fruit_id > f.fruit_id))

Instead of GROUPING the values you can check if another entry exists that fits better:

The STATUS is no any longer 'INITIAL'
The FRUIT_ID is higher

answered Sep 2, 2020 at 10:29

Radagast81

3,0461 gold badge9 silver badges21 bronze badges

Comments

Ranagal · Accepted Answer · 2020-09-02 11:51:58Z

0

delete from fruits where fruit_id not in 
(
    select  fruit_id_
    from    fruits
    match_recognize
    (
        partition by fruit_name, color
        order by fruit_id
        measures fruit_id as fruit_id_
        all rows per match 
        pattern ( ( a  {- b* -} ) | ( c {- d* -} ) )
        define  a as status = 'INITIAL', 
                b as status = a.status ,
                c as status = 'COMPLETE',
                d as status = 'INITIAL'
    )
)

answered Sep 2, 2020 at 11:51

Ranagal

3171 silver badge4 bronze badges

Comments

SoftwareEngineer · Accepted Answer · 2022-01-04 06:46:22Z

0

If you want to delete duplicated rows, you can use the following statement:

DELETE FROM your_table WHERE rowid not in (SELECT MIN(rowid) FROM your_table GROUP BY column1, column2, column3);

answered Jan 4, 2022 at 6:46

SoftwareEngineer

4474 silver badges3 bronze badges

Collectives™ on Stack Overflow

Oracle SQL : How to delete Duplicate Rows

7 Answers 7

2 Comments

Comments

2 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

2 Comments

Comments

2 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related