-1

I would like to remove duplicate entries in Postgresql.

There is no unique constraint, but I would like to consider all columns together to consider a row as a duplicate.

So we have a table containing following rows :

id   |   name   | age           | started_date  |Score |
-----|----------|---------------|---------------|------|
1    | tom      | 15            | 01/06/2022    |5     |
2    | tom      | 15            | 01/06/2022    |5     |
3    | henry    | 10            | 01/06/2022    |4     |
4    | john     | 11            | 01/06/2022    |6     |
...

I would like to consider all columns together to identify the duplicate rows.

How to achieve this in Postgresql ?

4
  • What have you tried? What's your schema? What's the sample data? Commented Jun 10, 2018 at 6:28
  • Did you try use DISTINCT on all your columns? Commented Jun 10, 2018 at 6:28
  • What do you mean by "remove"? In a query or in the data itself? Commented Jun 10, 2018 at 11:33
  • related question stackoverflow.com/questions/56695357/… Commented Apr 7, 2022 at 0:44

2 Answers 2

7

PostgreSQL assigns a ctid pseudo-column to identify the physical location of each row. You could use that to identify different rows with the same values:

-- Create the table
CREATE TABLE my_table (num1 NUMERIC, num2 NUMERIC);

-- Create duplicate data
INSERT INTO my_table VALUES (1, 2);
INSERT INTO my_table VALUES (1, 2);

-- Remove duplicates
DELETE FROM my_table
WHERE ctid IN (SELECT ctid
               FROM   (SELECT ctid,
                              ROW_NUMBER() OVER (
                                PARTITION BY num1, num2) AS rn
                       FROM   my_table) t
               WHERE  rn > 1);

DB Fiddle

Sign up to request clarification or add additional context in comments.

Comments

0

Let say your table has 2 columns, you can identify duplicates using. Post this :-

1) Insert this result into a temp table

2) Drop data from Main table

3) Insert data from temp table into main table

4) Drop temp table.

select col1, col2, count(*) as cnt
from table1
group by col1, col2
having  cnt > 1 

2 Comments

Is there a way to delete found duplicate data directly from the table?
SQL mentioned in answer will do the same.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.