PostgreSQL SQL query to find number of occurrences of substring in string

Question

I’m trying to wrap my head around a problem but I’m hitting a blank. I know SQL quite well, but I’m not sure how to approach this.

My problem:

Given a string and a table of possible substrings, I need to find the number of occurrences.

The search table consists of a single colum:

searchtable

| pattern TEXT PRIMARY KEY|
|-------------------------|
| my                      |
| quick                   |
| Earth                   |

Given the string "Earth is my home planet and where my friends live", the expected outcome is 3 (2x "my" and 1x "Earth").

In my function, I have variable bodytext which is the string to examine.

I know I can do IN (SELECT pattern FROM searchtable) to get the list of substrings, and I could possibly use a LIKE ANY clause to get matches, but how can I count occurrences of the substrings in the table within the search string?

"how can I count occurrences of the substrings in the table within the search string?" Don't count. Calculate it. Pseudo code: (length(originl string) - length(replace(original string, substring, '')) / length(substring) — Zohar Peled
– Zohar Peled, Commented Oct 11, 2020 at 7:39
@ZoharPeled I ended up solving the problem myself shortly after posting the question, but I used exactly this method. I'll post the complete solution for reference. — user10504
– user10504, Commented Oct 11, 2020 at 12:17
@a_horse_with_no_name I will update it in case someone else comes across the question. The table is really only one column as described. — user10504
– user10504, Commented Oct 11, 2020 at 12:17

Gordon Linoff · Accepted Answer · 2020-10-11 12:45:28Z

4

This is easily done without a custom function:

select count(*)
from (values ('Earth is my home planet and where my friends live')) v(str) cross join lateral
     regexp_split_to_table(v.str, ' ') word join
     patterns p
     on word = p.pattern

Just break the original string into "words". Then match on the words.

Another method uses regular expression matching:

select (select count(*) from regexp_matches(v.str, p.rpattern, 'g'))
from (values ('Earth is my home planet and where my friends live')) v(str) cross join
     (select string_agg(pattern, '|') as rpattern
      from patterns
     ) p;

This stuffs all the patterns into a regular expression. Not that this version does not take word breaks into account.

Here is a db<>fiddle.

answered Oct 11, 2020 at 12:45

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user10504 · Accepted Answer · 2020-10-11 12:25:55Z

0

I solved the problem with the following code:

CREATE OR REPLACE FUNCTION count_matches(body TEXT, OUT matches INTEGER) AS $$
DECLARE
    results INTEGER := 0;
    matchlist RECORD;
BEGIN
FOR matchlist IN (SELECT pattern FROM searchtable)
LOOP
    results := results + (SELECT LENGTH(body) - 
        LENGTH(REPLACE(body, matchlist.pattern, ''))) / 
        LENGTH(matchlist.pattern);
END LOOP;
matches := results;
END;
$$ LANGUAGE plpgsql;

answered Oct 11, 2020 at 12:25

user10504

1391 silver badge8 bronze badges

Collectives™ on Stack Overflow

PostgreSQL SQL query to find number of occurrences of substring in string

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related