1

I’m trying to wrap my head around a problem but I’m hitting a blank. I know SQL quite well, but I’m not sure how to approach this.

My problem:

Given a string and a table of possible substrings, I need to find the number of occurrences.

The search table consists of a single colum:

searchtable

| pattern TEXT PRIMARY KEY|
|-------------------------|
| my                      |
| quick                   |
| Earth                   |

Given the string "Earth is my home planet and where my friends live", the expected outcome is 3 (2x "my" and 1x "Earth").

In my function, I have variable bodytext which is the string to examine.

I know I can do IN (SELECT pattern FROM searchtable) to get the list of substrings, and I could possibly use a LIKE ANY clause to get matches, but how can I count occurrences of the substrings in the table within the search string?

4
  • 1
    Please include the actual code for the function. Commented Oct 11, 2020 at 7:20
  • 3
    "how can I count occurrences of the substrings in the table within the search string?" Don't count. Calculate it. Pseudo code: (length(originl string) - length(replace(original string, substring, '')) / length(substring) Commented Oct 11, 2020 at 7:39
  • @ZoharPeled I ended up solving the problem myself shortly after posting the question, but I used exactly this method. I'll post the complete solution for reference. Commented Oct 11, 2020 at 12:17
  • @a_horse_with_no_name I will update it in case someone else comes across the question. The table is really only one column as described. Commented Oct 11, 2020 at 12:17

2 Answers 2

4

This is easily done without a custom function:

select count(*)
from (values ('Earth is my home planet and where my friends live')) v(str) cross join lateral
     regexp_split_to_table(v.str, ' ') word join
     patterns p
     on word = p.pattern

Just break the original string into "words". Then match on the words.

Another method uses regular expression matching:

select (select count(*) from regexp_matches(v.str, p.rpattern, 'g'))
from (values ('Earth is my home planet and where my friends live')) v(str) cross join
     (select string_agg(pattern, '|') as rpattern
      from patterns
     ) p;

This stuffs all the patterns into a regular expression. Not that this version does not take word breaks into account.

Here is a db<>fiddle.

Sign up to request clarification or add additional context in comments.

Comments

0

I solved the problem with the following code:

CREATE OR REPLACE FUNCTION count_matches(body TEXT, OUT matches INTEGER) AS $$
DECLARE
    results INTEGER := 0;
    matchlist RECORD;
BEGIN
FOR matchlist IN (SELECT pattern FROM searchtable)
LOOP
    results := results + (SELECT LENGTH(body) - 
        LENGTH(REPLACE(body, matchlist.pattern, ''))) / 
        LENGTH(matchlist.pattern);
END LOOP;
matches := results;
END;
$$ LANGUAGE plpgsql;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.