1

I have the following table in sql:

id | trip_id | stop_id | departure_time
----------------------------------------
1  |        1|        1|        06:25:00
2  |        1|        2|        06:35:00
3  |        1|        3|        06:45:00
4  |        1|        2|        06:55:00

What I need to do is identify where a trip_id as multiple instances of a certain stop_id (in this case stop_id 2).

I then need to delete any duplications leaving only the one with the latest departure time.

So given the above table Id delete the row with id 2 and be left with:

id | trip_id | stop_id | departure_time
----------------------------------------
1  |        1|        1|        06:25:00
3  |        1|        3|        06:45:00
4  |        1|        2|        06:55:00

I have managed to do this with a series of sql queries but I hit the N+1 issue and it takes ages.

Can anyone recommend a way I may be able to do this in one query? Or at the very least identify all the ids of rows that need deleting in one query?

Im doing this in ruby on rails so if you prefer an active record solution I wouldn't hate you for it :)

Thanks in advance.

3 Answers 3

3

You may try the following logic:

DELETE
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
              WHERE t2.trip_id = t1.trip_id AND
                    t2.stop_id = t1.stop_id AND
                    t2.departure_time > t1.departure_time);

In plain English, this says to scan your entire table, and delete any record for which we can find another record with an identical trip_id and stop_id, where the departure time is also greater than that of the record being considered for deletion. If we find such a match, then it is a duplicate according to your definition.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the respnse this looks great, but where do I specify the stop_id that I want to delete dups for? e.g. Im specifically looking for duplicates of stop_id 2456? Sorry if that wasn't clear in the question.
Ok so I just added WHERE t1.stop_id = 2465 AND EXISTS... Thanks so much
0

You can try below way -

   DELETE FROM tablename
   WHERE id in
    (
    select id from
      (select *, row_number() over(partition by stop_id order by departure_time desc) as rn from tablename)aa
    )A where rn>1

1 Comment

Deleting from a CTE is not supported in Postgres (unlike SQL Server).
0

try like below

DELETE FROM table a
WHERE a.ctid <> (SELECT max(b.ctid)
                 FROM   table b
                 WHERE  a.stop_id = b.stop_id)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.