14

My dilemma: I'm passing my function a string that I need to then perform numerous regex manipulations on. The logic is if there's a match in the first regex, do one thing. If no match, check for a match with the second and do something else, if not check the third, and so forth. I could do something like this:

if re.match('regex1', string):
    match = re.match('regex1', string)
    # Manipulate match.group(n) and return
elif re.match('regex2', string):
    match = re.match('regex2', string)
    # Do second manipulation
[etc.]

However, this feels unnecessarily verbose, and usually when that's the case it means there's a better way that I'm either overlooking or don't yet know about.

Does anyone have a suggestion for a better way to do this (better from a code-appearance standpoint, a memory usage standpoint, or both)?

1

7 Answers 7

24

Generally speaking, in these sorts of situations, you want to make the code "data driven". That is, put the important information in a container, and loop through it.

In your case, the important information is (string, function) pairs.

import re

def fun1():
    print('fun1')

def fun2():
    print('fun2')

def fun3():
    print('fun3')

regex_handlers = [
    (r'regex1', fun1),
    (r'regex2', fun2),
    (r'regex3', fun3)
    ]

def example(string):
    for regex, fun in regex_handlers:
        if re.match(regex, string):
            fun()  # call the function
            break

example('regex2')
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for this suggestion! This was what I was probably going to end up doing, but the overridden version of the re module is a slightly better fit for this project.
13

Similar question from back in september: How do you translate this regular-expression idiom from Perl into Python?

Using global variables in a module maybe not the best way to do it, but converting it into a class:

import re

class Re(object):
  def __init__(self):
    self.last_match = None
  def match(self,pattern,text):
    self.last_match = re.match(pattern,text)
    return self.last_match
  def search(self,pattern,text):
    self.last_match = re.search(pattern,text)
    return self.last_match

gre = Re()
if gre.match(r'foo',text):
  # do something with gre.last_match
elif gre.match(r'bar',text):
  # do something with gre.last_match
else:
  # do something else

1 Comment

Thanks for the link! I didn't find that topic in my search, but it's spot on for what I'm trying to do. I like the idea of using a class rather than a module, too.
2

I had the same problem as yours. Here´s my solution:

import re

regexp = {
    'key1': re.compile(r'regexp1'),
    'key2': re.compile(r'regexp2'),
    'key3': re.compile(r'regexp3'),
    # ...
}

def test_all_regexp(string):
    for key, pattern in regexp.items():
        m = pattern.match(string)
        if m:
            # do what you want
            break

It´s a slightly modified solution from the answer of Extracting info from large structured text files

1 Comment

Dictionaries don't guarantee ordering. You probably should use a sequence instead of a dict to get predictable behavior.
1

Hmm... you could use something with the with construct... um

class rewrapper()
    def __init__(self, pattern, target):
        something

    def __enter__(self):
        something

    def __exit__(self):
        something


 with rewrapper("regex1", string) as match:
    etc

 with rewrapper("regex2", string) as match: 
    and so forth

Comments

0

Are the manipulations for each regex similar? If so, try this:

for regex in ('regex1', 'regex2', 'regex3', 'regex4'):
    match = re.match(regex, string)
    if match:
        # Manipulate match.group(n)
        return result

1 Comment

Unfortunately the manipulations vary for the different regex; in retrospect, I should have specified that in the question.
0

Here your regexs and matches are not repeated twice:

match = re.match('regex1', string)
if match:
    # do stuff
    return

match = re.match('regex2', string)
if match:
    # do stuff
    return

Comments

0
class RegexStore(object):
   _searches = None

   def __init__(self, pat_list):
      # build RegEx searches
      self._searches = [(name,re.compile(pat, re.VERBOSE)) for
                        name,pat in pat_list]

   def match( self, text ):
      match_all = ((x,y.match(text)) for x,y in self._searches)
      try:
         return ifilter(op.itemgetter(1), match_all).next()
      except StopIteration, e:
         # instead of 'name', in first arg, return bad 'text' line
         return (text,None)

You can use this class like so:

rs = RegexStore( (('pat1', r'.*STRING1.*'),
                  ('pat2', r'.*STRING2.*')) )
name,match = rs.match( "MY SAMPLE STRING1" )

if name == 'pat1':
   print 'found pat1'
elif name == 'pat2':
   print 'found pat2'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.