36

Is there a way to see if a line contains words that matches a set of regex pattern? If I have [regex1, regex2, regex3], and I want to see if a line matches any of those, how would I do this? Right now, I am using re.findall(regex1, line), but it only matches 1 regex at a time.

1

5 Answers 5

59

You can use the built in functions any (or all if all regexes have to match) and a Generator expression to cycle through all the regex objects.

any (regex.match(line) for regex in [regex1, regex2, regex3])

(or any(re.match(regex_str, line) for regex in [regex_str1, regex_str2, regex_str2]) if the regexes are not pre-compiled regex objects, of course)

However, that will be inefficient compared to combining your regexes in a single expression. If this code is time- or CPU-critical, you should try instead to compose a single regular expression that encompasses all your needs, using the special | regex operator to separate the original expressions.

A simple way to combine all the regexes is to use the string join method:

re.match("|".join([regex_str1, regex_str2, regex_str2]), line)

A warning about combining the regexes in this way: It can result in wrong expressions if the original ones already do make use of the | operator.

Sign up to request clarification or add additional context in comments.

2 Comments

You can make the join method less likely to fail if you wrap each expression in parenthesis. '(' + ')|('.join(['foo', 'bar', 'baz']) + ')' gives '(foo)|(bar)|(baz)'.
Better yet, wrap in (?:...), and put the string together in a way that highlights its logical structure. '|'.join('(?:{0})'.format(x) for x in ('foo', 'bar', 'baz')) for example.
10

Try this new regex: (regex1)|(regex2)|(regex3). This will match a line with any of the 3 regexs in it.

4 Comments

(?:...) is probably a better idea than (...) here, to avoid creating spurious capture groups.
@Karl ...unless you want to check the truthiness of .group(n) to determine which group you captured.
@Karl Can you elaborate your comment? what you means by spurious groups?
like, groups that are unwanted, not helpful for solving the problem. I think I may have used the word incorrectly. But also that was over 7 years ago.
8

You cou loop through the regex items and do a search.

regexList = [regex1, regex2, regex3]

line = 'line of data'
gotMatch = False
for regex in regexList:
    s = re.search(regex,line)
    if s:
         gotMatch = True
         break

if gotMatch:
    doSomething()

Comments

1
#quite new to python but had the same problem. made this to find all with multiple 
#regular #expressions.

    regex1 = r"your regex here"
    regex2 = r"your regex here"     
    regex3 = r"your regex here"
    regexList = [regex1, regex1, regex3]

    for x in regexList:
    if re.findall(x, your string):
        some_list = re.findall(x, your string)     
        for y in some_list:
            found_regex_list.append(y)#make a list to add them to.

1 Comment

you should include an explanation with your code snippet
0

You can do this with a list comprehension. I was trying to identify the fields in a table that matched certain patterns. The input was a list of column names, a list of matches is returned.

def find_client_fields(cols=None):

        field = []  <==== variable to hold list of matches

        regex_list = [r'.*customer.*'  <==== list of regexes
                     ,r'.*vendor.*'
                     ,r'.*user.*'
                     ,r'.*source.*']
        
        [field.append(x) for x in cols for regex in regex_list if re.match(regex, x, re.IGNORECASE)]  <=== list comprehension to find matches and ignore the case
        
        return list(set(field)). <=== in case there are repeated names
    
find_client_fields(cols=temp.columns)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.