Oracle sql - finding duplicate values in a column based on another field

Question

I'm trying to select duplicate data from a column based on data from another column. For example, I have a table of events that are either 'IN_PROGRESS' or 'COMPLETE'. They each have an ID. Some of the events have the same ID but different statuses. I am trying to select the data where status = in_progress or status = complete but only if their ids are the same.

This is what I am trying so far:

SELECT id, count (*) 
FROM events WHERE status = 'IN_PROGRESS' OR status = 'STARTED'
GROUP BY id HAVING count (*) > 1;

But obviously it only returns the ids rather than the entire row so I can't see all the data. Ultimately I intend to select all of the data in the table while filtering out the duplicates based on the above.

I've started to look into a a join or comparing with a duplicate table but I'm not sure what is the best way to achieve what I need. Can someone please help?

Thanks

Except the status, have all the other fields the same values, if the id is the same? — Cynical
– Cynical, Commented Jun 21, 2016 at 10:52
What does that have to do with MySQL? I'm replacing the MySQL tag with an SQL tag. — Thorsten Kettner
– Thorsten Kettner, Commented Jun 21, 2016 at 10:54
I wonder how it even happened you got duplicates. You have an events table, so each record should represent one event. Your events are identified by ID, so how can there be duplicates? Why isn't ID the table's primary key? — Thorsten Kettner
– Thorsten Kettner, Commented Jun 21, 2016 at 10:58
As there can be duplicates in your table, can there be duplicate IDs with the same status? Then you'd have to count(distinct status) instead of count(*) so as to detect only IDs that have both statuses. — Thorsten Kettner
– Thorsten Kettner, Commented Jun 21, 2016 at 11:00

Giorgos Betsos · Accepted Answer · 2016-06-21 10:52:02Z

2

You can do it with a JOIN to a derived table produced by the query that detects the duplicate records:

SELECT e1.*, e2.cnt
FROM events e1
JOIN (
  SELECT id, count (*)  cnt
  FROM events 
  WHERE status = ('IN_PROGRESS', 'STARTED')
  GROUP BY id 
  HAVING count (*) > 1
) e2 ON e1.id = e2.id

Alternatively you can use a window function:

SELECT *
FROM (
  SELECT *,
         COUNT(CASE WHEN status = ('IN_PROGRESS', 'STARTED') THEN 1 END) 
         OVER (PARTITION BY id) AS cnt
  FROM events) e
WHERE e.cnt > 1

answered Jun 21, 2016 at 10:52

Giorgos Betsos

72.3k10 gold badges69 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

sagi Over a year ago

If STARTED or IN_PROGRESS appear more then once, this won't work. Filter the statuses and just COUNT(DISTINCT..)

krokodilko · Accepted Answer · 2016-06-21 10:55:35Z

1

Try

SELECT * FROM events e1
WHERE e1.status IN ( 'IN_PROGRESS' , 'STARTED' )
  AND EXISTS (
  SELECT 1 FROM events e2
  WHERE e2.id = e1.id 
    AND e2.status IN ( 'IN_PROGRESS' , 'STARTED' )
    AND e1.status <> e2.status
)

answered Jun 21, 2016 at 10:55

krokodilko

36.3k7 gold badges62 silver badges86 bronze badges

Collectives™ on Stack Overflow

Oracle sql - finding duplicate values in a column based on another field

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related