0

How can I link duplicate records in PostrgreSQL? I've found them with:

SELECT * FROM (
  SELECT id, import_id, name,
  ROW_NUMBER() OVER(PARTITION BY address ORDER BY name asc) AS Row
  FROM companies
) dups
where 
dups.Row > 1 ORDER BY dups.name;

See sample code and demo at http://sqlfiddle.com/#!15/af016/7/1

I want to add a column to companies called linked_id, that will be set to the import_id of the first of each set of duplicate records.

2
  • It might be a good idea not to use "Row" as a column alias. It's confusing, and it's also a keyword in some contexts. BTW, questions like this are much easier to answer if you post some sample data / schema as CREATE TABLE and INSERT statements. SQLFiddle.com can be handy for this, and has a text-to-sql conversion tool. Commented Dec 15, 2013 at 17:00
  • "Row" came from the example I found at stackoverflow.com/questions/14471179 Commented Dec 16, 2013 at 0:43

2 Answers 2

1

Try:

UPDATE companies c
SET import_id = q.import_id
FROM (
  SELECT id, 
  FIRST_VALUE(import_id) 
      OVER(PARTITION BY name, address ORDER BY name asc) AS import_id,
  ROW_NUMBER() 
      OVER(PARTITION BY name, address ORDER BY name asc) AS Rn
  FROM companies
) q
WHERE c.id = q.id AND q.rn > 1
;

Demo: http://sqlfiddle.com/#!15/af016/10

Sign up to request clarification or add additional context in comments.

1 Comment

I found my answer just as you posted your. Thanks!
1

This will set parent_id to the import_id of the first company to match.

UPDATE companies
SET parent_id=rs.parent_id FROM
(SELECT id, first_value(import_id)
 OVER (PARTITION BY address ORDER BY name) as parent_id
 FROM companies
) AS rs
WHERE rs.id=companies.id;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.