0

I have a table education that has a column university. For each of the rows in the table I want to find 3 most similar universities from the table.

Here is my query that finds 3 most similar universities to a given input:

select distinct(university),
               similarity(unaccent(lower(university)),
                          unaccent(lower('Boston university')))
from education
order by similarity(unaccent(lower(university)),
                    unaccent(lower('Boston university'))) desc
limit 3;

It works fine. But now I would like to modify this query so that I get two columns and a row for each existing university in the table: the first column would be the university name and the second would be the three most similar universities found in the database (or if its easier - four columns where the first is the university and the next 3 are the most similar ones).

What should this statement look like?

1
  • 1
    Sample data and desired results would really help. Commented Nov 10, 2019 at 0:03

2 Answers 2

1

You could use an inline aggregated query:

with u as (select distinct university from education)
select 
    university,
    (
        select string_agg(u.university, ',')
        from u 
        where u.university != e.university
        order by similarity(
            unaccent(lower(u.university)), 
            unaccent(lower(e.university))
        ) desc
        limit 3
    ) similar_universities
from education e

This assumes that a given university may occur more than once in the education table (hence the need for a cte).

Sign up to request clarification or add additional context in comments.

Comments

0

I think a lateral join and aggregation does what you want:

select e.university, e2.universities
from education e cross join lateral
     (select array_agg(university order by sim desc) as universities
      from (select e2.university,
                   similarity(unaccent(lower(e2.university)),
                              unaccent(lower(e.university))
                             ) as sim
            from education e2
            order by sim desc
            limit 3
           ) e2
      ) e2;

Note: The most similar university is probably the one with the same name. (You can filter that out with a where clause in the subquery.)

This returns the value as an array, because I prefer working with arrays rather than strings in Postgres.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.