3

I have a table below with sample data.

|  id  | type| cycle|    datetimestamp    |  status1  |  status2  |
|:----:|:---:|:----:|:-------------------:|:---------:|:---------:|
| 27   | AN  |  123 | 2022-12-28 04:12:31 | Normal A  | Normal A  |
| 27   | AR  |  124 | 2022-12-28 04:12:31 | Normal A  | Normal A  |<--Delete
| 19   | AN  |  125 | 2022-12-28 05:24:30 | Normal A  | Normal A  |
| 19   | AR  |  126 | 2022-12-28 06:18:20 | Normal A  | Normal A  |
| 19   | AR  |  234 | 2022-12-28 07:22:20 | Normal A  | Normal A  |<--Delete
| 19   | AN  |  235 | 2022-12-28 07:22:20 | Normal A  | Normal A  |
| 20   | AR  |  236 | 2022-12-28 08:25:49 | Normal A  | Normal A  |<--Delete
| 20   | AN  |  237 | 2022-12-28 08:25:49 | Normal A  | Normal A  |
| 19   | AR  |  129 | 2022-12-28 09:08:19 | Normal A  | Normal A  |<--Delete
| 19   | AN  |  127 | 2022-12-28 09:08:19 | Normal A  | Normal A  |
| 19   | AR  |  238 | 2022-12-28 10:04:31 | Normal A  | Normal A  |<--Delete
| 19   | AN  |  230 | 2022-12-28 10:04:31 | Normal A  | Normal A  |
| 22   | AN  |  239 | 2022-12-28 11:04:58 | Normal A  | Normal A  |
| 22   | AR  |  256 | 2022-12-28 11:04:58 | Normal A  | Normal A  |<--Delete

I want to find out the duplicates using columns(id, type, datetimestamp, status1, status2) and delete the duplicate row having column type='AR'(keep the type='AN'). Expecting below output -

|  id  | type| cycle|    datetimestamp    |  status1  |  status2  |
|:----:|:---:|:----:|:-------------------:|:---------:|:---------:|
| 27   | AN  |  123 | 2022-12-28 04:12:31 | Normal A  | Normal A  |
| 19   | AN  |  125 | 2022-12-28 05:24:30 | Normal A  | Normal A  |
| 19   | AR  |  126 | 2022-12-28 06:18:20 | Normal A  | Normal A  |
| 19   | AN  |  235 | 2022-12-28 07:22:20 | Normal A  | Normal A  |
| 20   | AN  |  237 | 2022-12-28 08:25:49 | Normal A  | Normal A  |
| 19   | AN  |  127 | 2022-12-28 09:08:19 | Normal A  | Normal A  |
| 19   | AN  |  230 | 2022-12-28 10:04:31 | Normal A  | Normal A  |
| 22   | AN  |  239 | 2022-12-28 11:04:58 | Normal A  | Normal A  |

But my query is returning having type='AN' not 'AR'.

select * from test_data e
where exists
 ( select * from test_data e2 
   where e.datetimestamp=e2.datetimestamp and e.id=e2.id 
     and e.status1=e2.status1 
     and e.status2=e2.status2 
     and e.type='AN' and e2.type='AR') order by e.datetimestamp asc;
     
     

Setup table queries:

~~~pgsql
CREATE TABLE test_data (
 id character varying(2) NOT NULL,
 type character varying(2),
 cycle integer,
 datetimestamp timestamp without time zone NOT NULL,
 status1 character varying(10),
 status2 character varying(10),
 PRIMARY KEY(id, cycle, datetimestamp)
);

INSERT INTO test_data VALUES
  (27, 'AN', 123, '2022-12-28 04:12:31', 'Normal A', 'Normal A')
, (27, 'AR', 124, '2022-12-28 04:12:31', 'Normal A', 'Normal A')
, (19, 'AN', 125, '2022-12-28 05:24:30', 'Normal A', 'Normal A')
, (19, 'AR', 126, '2022-12-28 06:18:20', 'Normal A', 'Normal A')
, (19, 'AR', 234, '2022-12-28 07:22:20', 'Normal A', 'Normal A')
, (19, 'AN', 235, '2022-12-28 07:22:20', 'Normal A', 'Normal A')
, (20, 'AR', 236, '2022-12-28 08:25:49', 'Normal A', 'Normal A')
, (20, 'AN', 237, '2022-12-28 08:25:49', 'Normal A', 'Normal A')
, (19, 'AR', 129, '2022-12-28 09:08:19', 'Normal A', 'Normal A')
, (19, 'AN', 127, '2022-12-28 09:08:19', 'Normal A', 'Normal A')
, (19, 'AR', 238, '2022-12-28 10:04:31', 'Normal A', 'Normal A')
, (19, 'AN', 230, '2022-12-28 10:04:31', 'Normal A', 'Normal A')
, (22, 'AN', 239, '2022-12-28 11:04:58', 'Normal A', 'Normal A')
, (22, 'AR', 256, '2022-12-28 11:04:58', 'Normal A', 'Normal A')
;
~~~

3 Answers 3

1

Try this :

DELETE FROM test_data a
USING test_data b
WHERE a.id = b.id
  AND a.type = 'AN'
  AND b.type = 'AR'
  AND a.datetimestamp = b.datetimestamp
  AND a.status1 = b.status2
  AND a.status2 = b.status2

test result in dbfiddle

Sign up to request clarification or add additional context in comments.

3 Comments

Hi @Edouard, thank you for the reply. It's working as expected. How can we achieve If I want to delete exact match or nearby (a.datetimestamp - b.datetimestamp <= 3 secs) match. Let's say If we increased 17th line in dbfiddle by 2 secs and still want to delete the 16th line and keep 17th line?
just replace AND a.datetimestamp = b.datetimestamp by AND a.datetimestamp - b.datetimestamp BETWEEN interval '-3 seconds' AND interval '3 seconds'
I am looking for another solution in Python Panda's, please help me if you or any of your contact know's python - stackoverflow.com/questions/74440137/….
1

exists looks like a good approach. Starting from your query, I just fixed the conditions on type (we want to delete "AR" record based on the presence of "AN") ; I also rewrote the multiple equalities to tuple equality - but that's just syntactic sugar:

delete 
from test_data e1
where type = 'AR' and exists (
    select 1 
    from test_data e2
    where (e2.datetimestamp, e2.id, e2.status1, e2.status2, e2.type) 
        = (e1.datetimestamp, e1.id, e1.status1, e1.status2, 'AN')
)

Comments

0

You can list the deleted row by this query:

ALTER TABLE test_data ADD COLUMN test_data_id serial;

SELECT t.test_data_id 
FROM test_data t
JOIN (
        SELECT array_agg(test_data_id) as arr from test_data GROUP BY id, datetimestamp, status1, status2 HAVING COUNT(*) > 1) as dup
    ON t.test_data_id = ANY(dup.arr)
WHERE t.type = 'AR'

So the delete query will be:

DELETE FROM test_data 
WHERE test_data_id IN (
        select t.test_data_id 
        from test_data t
        JOIN (
                select array_agg(test_data_id) as arr from test_data GROUP BY id, datetimestamp, status1, status2 HAVING COUNT(*) > 1
            ) as dup ON t.test_data_id = ANY(dup.arr)
        WHERE t.type = 'AR'
)

A little explain: My solution is you create a new unique column and take it as a primary key. Then you can filter the duplicate row and remove it with the type 'AR'.

p/s: if you don't allow to add 1 more column you can use ctid column instead but I don't refer this solution.

Hopefully, it will help you.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.