PostgreSQL: Compare strings in the same column but in different series (rows)

Question

I have a simple table generated with a subquery that applies many different filters.

  | project
1 | Hello
2 | Hello 2.0
3 | Ordinary Sheep
4 | Sheep

The next step is to remove projects with very similar names (for example, if a project has the same name but followed by a 2.0).

In this case I need my query to remove Project 2.0 from the results. This little issue is more challenging than I expected.

My best bet seems to be this one bellow where I correctly identify the project that should be excluded, but if I invert the operation I end up with duplicated data because due to the self join.

SELECT 
    q1.name,
    q2.name
FROM subquery q1
JOIN subquery q2 ON q1.name LIKE q2.name || '%'
WHERE q1.id <> q2.id;

Thank you so much!

Kaushik Nayak · Accepted Answer · 2019-05-29 14:32:37Z

1

May be you can match the first occurrence of a digit in a project and exclude everything after that.Then apply RTRIM and DISTINCT over it. This will however not work if the project name itself has a number in it.

with s as 
( 
   --your query that you have inside sub-query
)
select DISTINCT RTRIM(regexp_replace(project, '^([^\d]+)\d.*$','\1')) from s;

DEMO

answered May 29, 2019 at 14:32

Kaushik Nayak

32k6 gold badges36 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

PostgreSQL: Compare strings in the same column but in different series (rows)

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related