1

I'm trying to select duplicate data from a column based on data from another column. For example, I have a table of events that are either 'IN_PROGRESS' or 'COMPLETE'. They each have an ID. Some of the events have the same ID but different statuses. I am trying to select the data where status = in_progress or status = complete but only if their ids are the same.

This is what I am trying so far:

SELECT id, count (*) 
FROM events WHERE status = 'IN_PROGRESS' OR status = 'STARTED'
GROUP BY id HAVING count (*) > 1;

But obviously it only returns the ids rather than the entire row so I can't see all the data. Ultimately I intend to select all of the data in the table while filtering out the duplicates based on the above.

I've started to look into a a join or comparing with a duplicate table but I'm not sure what is the best way to achieve what I need. Can someone please help?

Thanks

4
  • Except the status, have all the other fields the same values, if the id is the same? Commented Jun 21, 2016 at 10:52
  • What does that have to do with MySQL? I'm replacing the MySQL tag with an SQL tag. Commented Jun 21, 2016 at 10:54
  • I wonder how it even happened you got duplicates. You have an events table, so each record should represent one event. Your events are identified by ID, so how can there be duplicates? Why isn't ID the table's primary key? Commented Jun 21, 2016 at 10:58
  • As there can be duplicates in your table, can there be duplicate IDs with the same status? Then you'd have to count(distinct status) instead of count(*) so as to detect only IDs that have both statuses. Commented Jun 21, 2016 at 11:00

2 Answers 2

2

You can do it with a JOIN to a derived table produced by the query that detects the duplicate records:

SELECT e1.*, e2.cnt
FROM events e1
JOIN (
  SELECT id, count (*)  cnt
  FROM events 
  WHERE status = ('IN_PROGRESS', 'STARTED')
  GROUP BY id 
  HAVING count (*) > 1
) e2 ON e1.id = e2.id

Alternatively you can use a window function:

SELECT *
FROM (
  SELECT *,
         COUNT(CASE WHEN status = ('IN_PROGRESS', 'STARTED') THEN 1 END) 
         OVER (PARTITION BY id) AS cnt
  FROM events) e
WHERE e.cnt > 1
Sign up to request clarification or add additional context in comments.

1 Comment

If STARTED or IN_PROGRESS appear more then once, this won't work. Filter the statuses and just COUNT(DISTINCT..)
1

Try

SELECT * FROM events e1
WHERE e1.status IN ( 'IN_PROGRESS' , 'STARTED' )
  AND EXISTS (
  SELECT 1 FROM events e2
  WHERE e2.id = e1.id 
    AND e2.status IN ( 'IN_PROGRESS' , 'STARTED' )
    AND e1.status <> e2.status
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.