Remove case insensitive duplicates in sql (postgres)

Question

I have a postgresql database, and I'm trying to delete (or even just get the ids) of the older of the duplicates I have in my table, but only those who are because of case sensitivity, for example helLo and hello.

The table is quite large and my nested query takes a really long time, I wonder if there is a better, more efficient way to do my query in one go, and not split it up to multiple queries, cause there's a lot of ids in question

SELECT * FROM some_table AS out
WHERE (SELECT count(*) FROM some_table AS in
    WHERE out.text != in.text 
    AND LOWER(in.text) = LOWER(out.text) 
    AND in.created_at > out.created_at) > 1

Thanks!

JohanB · Accepted Answer · 2022-05-05 11:14:01Z

1

Can you try

SELECT LOWER(text), ROW_NUMBER() OVER( PARTITION by LOWER(text) ORDER by created_at ) as rn
FROM some_table

You can then use the rn column as a filter

edited May 5, 2022 at 11:14

answered May 2, 2022 at 8:52

JohanB

3862 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Shoham Ben Shitrit Over a year ago

A good solution that worked for me! Though for people copying it or using it in the future, the comma after PARTITION by LOWER(text) is not needed and should be removed

O. Jones · Accepted Answer · 2022-05-02 13:52:18Z

0

To help this query, create an expression index on LOWER(text). Include created_at in the index to help the date comparisons.

CREATE INDEX text_lower ON some_table(LOWER(text), created_at);

It's hard to test this without your data, though.

answered May 2, 2022 at 13:52

O. Jones

109k17 gold badges134 silver badges187 bronze badges

Collectives™ on Stack Overflow

Remove case insensitive duplicates in sql (postgres)

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related