0

I have a postgresql database, and I'm trying to delete (or even just get the ids) of the older of the duplicates I have in my table, but only those who are because of case sensitivity, for example helLo and hello.

The table is quite large and my nested query takes a really long time, I wonder if there is a better, more efficient way to do my query in one go, and not split it up to multiple queries, cause there's a lot of ids in question

SELECT * FROM some_table AS out
WHERE (SELECT count(*) FROM some_table AS in
    WHERE out.text != in.text 
    AND LOWER(in.text) = LOWER(out.text) 
    AND in.created_at > out.created_at) > 1

Thanks!

2 Answers 2

1

Can you try

SELECT LOWER(text), ROW_NUMBER() OVER( PARTITION by LOWER(text) ORDER by created_at ) as rn
FROM some_table 

You can then use the rn column as a filter

Sign up to request clarification or add additional context in comments.

1 Comment

A good solution that worked for me! Though for people copying it or using it in the future, the comma after PARTITION by LOWER(text) is not needed and should be removed
0

To help this query, create an expression index on LOWER(text). Include created_at in the index to help the date comparisons.

CREATE INDEX text_lower ON some_table(LOWER(text), created_at);

It's hard to test this without your data, though.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.