1

Implemented removal of duplicate rows as per https://www.oracletutorial.com/advanced-oracle-sql/how-to-delete-duplicate-records-in-oracle/.

However, my situation needs further work. Let's assume that my Table looks like this :

CREATE TABLE fruits
(
    fruit_id   NUMBER generated BY DEFAULT AS IDENTITY,
    fruit_name VARCHAR2(100),
    color      VARCHAR2(20),
    status     varchar2(10),
    PRIMARY KEY (fruit_id)
);

INSERT  INTO fruits(fruit_name, color, status) VALUES ('Apple', 'Red', 'INITIAL');
INSERT  INTO fruits(fruit_name, color, status)  VALUES ('Apple', 'Red', 'INITIAL');
INSERT  INTO fruits(fruit_name, color, status) VALUES ('Orange', 'Orange', 'COMPLETE');
INSERT  INTO fruits(fruit_name, color, status) VALUES ('Orange', 'Orange', 'INITIAL');
INSERT  INTO fruits(fruit_name, color, status) VALUES ('Orange', 'Orange', 'INITIAL');
INSERT  INTO fruits(fruit_name, color, status) VALUES ('Banana', 'Yellow', 'INITIAL');
INSERT  INTO fruits(fruit_name, color, status) VALUES ('Banana', 'Green', 'INITIAL');

DELETE
FROM fruits
WHERE fruit_id NOT IN
      (
          SELECT MAX(fruit_id)
          FROM fruits
          GROUP BY fruit_name,
                   color
      )
  AND STATUS = 'INITIAL';

After deleting the duplicates like above, I still find that one of the duplicate rows(fruit_id =5) still remains.

select * from fruits;

2,Apple,Red,INITIAL
3,Orange,Orange,COMPLETE
5,Orange,Orange,INITIAL
6,Banana,Yellow,INITIAL
7,Banana,Green,INITIAL

I would like to delete all duplicate rows that are in 'INITIAL' state.

How should I go about it ?

UPDATE

Just to be sure, the logic should be : All NON-MAX records in 'INITIAL' state should be deleted. Also, if a record with 'COMPLETE' status is present, then I'd like the duplicate 'INITIAL' record to be deleted as well. In my example, I'd like record with fruit_id = 5(with STATE='INITIAL') to be deleted since there's another record with fruit_id =3(with STATE='COMPLETE') which has the same value of "orange", "orange" but with 'COMPLETE' value.

2
  • In your scenario can there be more than one Completed with same fruit_nameand color? Commented Sep 2, 2020 at 10:25
  • I think the record with fruit_id = 2 should be deleted instead of the one with fruit_id = 1 as being non-max for own group of fruit_name and color. Commented Sep 2, 2020 at 10:43

7 Answers 7

1

I would use a correlated subquery. I think the logic you want is:

delete from fruits f
where status = 'INITIAL' and exists(
    select 1 
    from fruits f1 
    where 
        f1.fruit_name = f.fruit_name 
        and f1.color = f.color
        and (
            (f1.status = 'INITIAL' and f1.fruit_id > f.fruit_id)
            or (f1.status = 'COMPLETE' and f1.fruit_id <> f.fruit_id)
        )
)

This deletes rows whose status is initial and for which another row exists with the same name and color and either status initial and a greater id, or status complete.

Demo on DB Fiddle:

FRUIT_ID | FRUIT_NAME | COLOR  | STATUS  
-------: | :--------- | :----- | :-------
       2 | Apple      | Red    | INITIAL 
       3 | Orange     | Orange | COMPLETE
       6 | Banana     | Yellow | INITIAL 
       7 | Banana     | Green  | INITIAL 
Sign up to request clarification or add additional context in comments.

2 Comments

HI @GMB, thanks. Obviously, this is not the exact table I have in my DB. My Table has about 20 columns and about 50K records. Except fruit_id(PK), none of the columns have an index. What should I do to make your DELETE NOT take enormous amount of time ? Also, our DB is Oracle 11 and not Oracle 18c
@anjanb: an index on (color, fruit_name, status) might be useful. 50k records is not that much, so I would not really expect performance issues.
1

Let's start with the rows you want to keep:

select f.*
from (select f.*,
             sum(case when status = 'COMPLETE' then 1 else 0 end) over (partition by fruit_name, color) as num_complete,
             max(id) over (partition by fruit_name, color) as max_id
      from fruits f
     ) f
where status = 'COMPLETE' or
      (num_complete = 0 and id < max_id);

This is a good basis for doing the delete. One method:

delete fruits f
    where not exists (select 1
                      from (select f2.*,
                                   sum(case when status = 'COMPLETE' then 1 else 0 end) over (partition by fruit_name, color) as num_complete,
                                   max(id) over (partition by fruit_name, color) as max_id
                            from fruits f2
                           ) f2
                      where ( f.status = 'COMPLETE' or
                              (f.num_complete = 0 and f.id < f.max_id)
                            ) and
                            f.fruit_id = f2.fruit_id
                     );

If you are deleting a lot of rows in a large table, you might find it more efficient to recreate the table:

create table temp_fruits as
    select fruit_id, fruit_name, color, status
    from (select f.*,
                 sum(case when status = 'COMPLETE' then 1 else 0 end) over (partition by fruit_name, color) as num_complete,
                 max(id) over (partition by fruit_name, color) as max_id
          from fruits f
         ) f
    where status = 'COMPLETE' or
          (num_complete = 0 and id < max_id);

truncate table fruits;

insert into fruits (fruit_id, fruit_name, color, status)
     select * from temp_fruits;

Note that this changes the row id as well.

I originally misunderstood, thinking you wanted to delete the COMPLETE record as well:

delete fruits f
    where exists (select 1 
                  from fruits f2
                  where f2.fruit_name = f.fruit_name and
                        f2.color = f.color and
                        f2.status = 'COMPLETE'
                 ) or
          f.id < (select max(f2.id)
                 from fruits f2
                 where f2.fruit_name = f.fruit_name and
                       f2.color = f.color
                );

Comments

1

You can use ROW_NUMBER() analytic function

DELETE fruits
 WHERE fruit_id IN 
     ( WITH del AS 
      (
       SELECT f.*,
              ROW_NUMBER() OVER
              (PARTITION BY fruit_name, color 
                   ORDER BY CASE WHEN f.status = 'COMPLETE' THEN 0 ELSE fruit_id END) 
                      AS rn                            
         FROM fruits f
       )  
       SELECT fruit_id
         FROM del
        WHERE status = 'INITIAL'
          AND rn > 1
      )

where rn > 1 filters out records with non-max fruit_id values when grouped by fruit_name and color.

Demo

2 Comments

Hi @Barbaros Özhan : Thanks. I tried this but modified the order in which "orange" records were inserted and now the query doesn't return what is expected. The Query retains record with fruit_id =3 which is a duplicate -- dbfiddle.uk/…. Could you fix that ? thanks
you're welcome @anjanb . I've just edited. Seems you need conditional logic for the records with status = 'COMPLETED'.
0

You added an additional column so the logic needs to be modified. The NOT IN clause looks at all fruit in any status, you should limit that to only fruit in INITIAL status

DELETE
FROM fruits
WHERE fruit_id NOT IN
      (
          SELECT MAX(fruit_id)
          FROM fruits WHERE status = 'INITIAL'
          GROUP BY fruit_name,
                   color
      )
AND status = 'INITIAL';

2 Comments

Hi @Koen Lostrie : That doesn't work since fruit_id = 5 is the max. I'd like all NON-MAX records to be deleted. Also, if a record with 'COMPLETE' status is present, then I'd like the 'INITIAL' record with MAX value to be deleted as well. In my example, I'd like record with fruit_id = 5 to be deleted since there's another record with fruit_id =3 which has the same value of "orange", "orange" but with 'COMPLETE' value.
still doesn't work. I didn't see any change. I didn't introduce any new column. Just clarified expectation.
0

I think you can just do the following:

DELETE
FROM fruits f
WHERE STATUS = 'INITIAL'
  AND EXISTS (SELECT 1 FROM fruits
               WHERE fruit_name = f.fruit_name
                 AND color = f.color
                 AND (STATUS != f.STATUS OR fruit_id > f.fruit_id))

Instead of GROUPING the values you can check if another entry exists that fits better:

  • The STATUS is no any longer 'INITIAL'
  • The FRUIT_ID is higher

Comments

0
delete from fruits where fruit_id not in 
(
    select  fruit_id_
    from    fruits
    match_recognize
    (
        partition by fruit_name, color
        order by fruit_id
        measures fruit_id as fruit_id_
        all rows per match 
        pattern ( ( a  {- b* -} ) | ( c {- d* -} ) )
        define  a as status = 'INITIAL', 
                b as status = a.status ,
                c as status = 'COMPLETE',
                d as status = 'INITIAL'
    )
)

Comments

0

If you want to delete duplicated rows, you can use the following statement:

DELETE FROM your_table WHERE rowid not in (SELECT MIN(rowid) FROM your_table GROUP BY column1, column2, column3);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.