5

So I get some input in python that I need to parse using regexps.

At the moment I'm using something like this:

matchOK = re.compile(r'^OK\s+(\w+)\s+(\w+)$')
matchFailed = re.compile(r'^FAILED\s(\w+)$')
#.... a bunch more regexps

for l in big_input:
  match = matchOK.search(l)
  if match:
     #do something with match
     continue
  match = matchFailed.search(l)
  if match:
     #do something with match
     continue
  #.... a bunch more of these 
  # Then some error handling if nothing matches

Now usually I love python because its nice and succinct. But this feels verbose. I'd expect to be able to do something like this:

for l in big_input:      
  if match = matchOK.search(l):
     #do something with match     
  elif match = matchFailed.search(l):
     #do something with match 
  #.... a bunch more of these
  else
    # error handling

Am I missing something, or is the first form as neat as I'm going to get?

5
  • 2
    Duplicate to stackoverflow.com/questions/2554185/match-groups-in-python and stackoverflow.com/questions/122277/… ? Commented Apr 1, 2011 at 8:47
  • 1
    I think your first approach is clear enough and will be easy to grok a year from now. Personally, I would change the name of matchOK and matchFailed to patOK and patFailed because they are pattern objects, not match objects. I suspect you are overusing regular expressions -- my approach would be to use if l.startswith('OK '): and if l.startswith('FAILED '):, etc. Commented Apr 1, 2011 at 8:48
  • @Curd Yep it seems that the first of those is almost equivalent and its answer seems like the best. Commented Apr 1, 2011 at 9:22
  • @Steven Rumbalski This is a simplification. The real regexps are significantly nastier. Commented Apr 1, 2011 at 9:26
  • This is a very small point, but you do not actually have to keep the patterns around; you can just do something at the top of your file like searchOK = re.compile(r'^OK\s+(\w+)\s+(\w+)$').search and then later say match = searchOK(string). Commented Apr 1, 2011 at 14:24

4 Answers 4

3
class helper:
    def __call__(self, match):
        self.match= match
        return bool(match)

h= helper()
for l in big_input:      
    if h(matchOK.search(l)):
        # do something with h.match     
    elif h(matchFailed.search(l)):
        # do something with h.match 
    ... # a bunch more of these
    else:
        # error handling

Or matchers as class methods:

class matcher:
    def __init__(self):
        # compile matchers
        self.ok= ...
        self.failed= ...
        self....= ...

    def matchOK(self, l):
        self.match= self.ok(l)
        return bool(self.match)

    def matchFailed(self, l):
        self.match= self.failed(l)
        return bool(self.match)

    def match...(self, l):
        ...

m= matcher()
for l in big_input:      
    if m.matchOK(l):
        # do something with m.match     
    elif m.matchFailed(l):
        # do something with m.match 
    ... # a bunch more of these
    else:
        # error handling
Sign up to request clarification or add additional context in comments.

4 Comments

You need colons after your if and else clauses; and, you do not need to compare your match against None because, according to the docs, “Match Objects always have a boolean value of True, so that you can test whether e.g. match() resulted in a match with a simple if statement.” docs.python.org/library/re.html#match-objects
@Brandon: Clearly this is not a implementation rather rough pseudo code to demonstration the ideas! Yes, the actual implementation can be streamlined, here the None treatment is just emphasizing the point. Thanks
You could “emphasize the point” in one line rather than three by replacing the big if-else constructs with return bool(match) which still tells the Python programmer unambiguously that you are returning a value to be used as a true/false decision, but using much less code.
@Brandon: fair enough suggestion. Thanks
0

How about something like:

for l in big_input:
    for p in (matchOK, matchFailed): # other patterns go in here
        match = p.search(l)
        if match: break
    if (not match): p = None # no patterns matched
    if (p is matchOK):
        # do something with match
    elif (p is matchFailed):
        # do something with match
    #.... a bunch more of these 
    else:
        assert p is None
        # Then some error handling if nothing matches

Comments

0

And something like that ? :

import re


def f_OK(ch):
    print 'BINGO ! : %s , %s' % re.match('OK\s+(\w+)\s+(\w+)',ch).groups()

def f_FAIL(ch):
    print 'only one : ' + ch.split()[-1]

several_func = (f_OK, f_FAIL)


several_REs = ('OK\s+\w+\s+\w+',
               'FAILED\s+\w+')

globpat = re.compile(')|('.join(several_REs).join(('^(',')$')))




with open('big_input.txt') as handle:
    for i,line in enumerate(handle):
        print 'line '+str(i)+' - ',
        mat = globpat.search(line)
        if mat:
            several_func[mat.lastindex-1](mat.group())
        else:
            print '## no match ## '+repr(line)

I tried it on a file whose content is:

OK tiramisu sunny   
FAILED overclocking   
FAILED nuclear    
E = mcXc    
OK the end  

the result is

line 0 -  BINGO ! : tiramisu , sunny
line 1 -  only one : overclocking
line 2 -  only one : nuclear
line 3 -  ## no match ## 'E = mcXc\n'
line 4 -  BINGO ! : the , end

This allow you to define quantities of REs and functions separatly, to add some, to remove some, etc

Comments

-1

Even better, how about a slightly simpler version of eat's code using a nested function:

import re

matchOK = re.compile("ok")
matchFailed = re.compile("failed")
big_input = ["ok to begin with", "failed later", "then gave up"]

for l in big_input:
    match = None
    def matches(pattern):
        global match
        match = pattern.search(l)
        return match
    if matches(matchOK):
        print "matched ok:", l, match.start()
    elif matches(matchFailed):
        print "failed:", l, match.start()
    else:
        print "ignored:", l

Note that this will work if the loop is part of the top level of the code, but is not easily converted into a function - the variable match still has to be a true global at the top level.

12 Comments

Using != to test object identity is generally considered bad form.
@Brandon: i'm not using it to test identity, i'm using it to test non-noneness, which is a rather specific kind of identity. Either way, i have never come across the idea that using != in this way is a problem - could you point me at something i could read about this?
“Comparisons to singletons like None should always be done with 'is' or 'is not', never the equality operators.” — python.org/dev/peps/pep-0008
And, anyway, you do not need to compare your match against None because, according to the docs, “Match Objects always have a boolean value of True, so that you can test whether e.g. match() resulted in a match with a simple if statement.” docs.python.org/library/re.html#match-objects
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.