I have a list of words and a string and would like to get back a list of words from the original list which are found in the string.
Ex:
import re
lof_terms = ['car', 'car manufacturer', 'popular']
str_content = 'This is a very popular car manufacturer.'
pattern = re.compile(r"(?=(\b" + r"\b|".join(map(re.escape, lof_terms)) + r"\b))")
found_terms = re.findall(pattern, str_content)
This will only return ['car', 'popular']. It fails to catch 'car manufacturer'. However it will catch it if I change the source list of terms to
lof_terms = ['car manufacturer', 'popular']
Somehow the overlapping between 'car' and 'car manufacturer' seems to be source of this issue.
Any ideas how to get over this?
Many thanks
regexmust?