Assign and Test Regex in Python?

Question

In many of my python projects, I find myself having to go through a file, match lines against regexes, and then perform some computation on the basis of elements from the line extracted by regex.

In pseudo-C code, this is pretty-easy:

while (read(line))
{
    if (m=matchregex(regex1,line))
    {
         /* munch on the components extracted in regex1 by accessing m */
    }
    else if (m=matchregex(regex2,line))
    {
         /* munch on the components extracted in regex2 by accessing m */
    }
    else if ...
    ...
    else
    {
         error("Unrecognized line format");
    }
}

However, because python does not allow an assignment in the conditional of an if, this can't be done elegantly. One could first parse against all the regexes and then do the if on the various match objects, but that is neither elegant nor efficient.

What I find myself doing instead is including code like this at the base level of every project:

im=None
img=None
def imps(p,s):
    global im
    global img
    im=re.search(p,s)
    if im:
        img=im.groups()
        return True
    else:
        img=None
        return False

Then I can work like this:

for line in open(file,'r').read().splitlines():
    if imps(regex1,line):
        # munch on contents of img
    elsif imps(regex2,line):
        # munch on contents of img
    else:
        error('Unrecognised line: {}'.format(line))

That works, is reasonably compact, and easy to type. But it is hardly beautiful; it uses global variables and is not thread safe (which has not been an issue for me so far).

But I'm sure others have run across this problem before and come up with an equally compact, but more python-y and generally superior solution. What is it?

Sean Perry · Accepted Answer · 2014-01-28 00:54:28Z

2

Depends on the needs of the code.

A common choice I use is something like this:

# note, order is important here. The first one to match will exit the processing
parse_regexps = [
    (r"^foo", handle_foo),
    (r"^bar", handle_bar),
]

for regexp, handler in parse_regexps:
    m = regexp.match(line)
    if m:
        handler(line)  # possibly other data too like m.groups
        break
else:
    error("Unrecognized format....")

This has the advantage of moving the handling code into clear and obvious functions which makes testing and change easy.

answered Jan 28, 2014 at 0:54

Sean Perry

3,9261 gold badge24 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

georg · Accepted Answer · 2014-01-28 01:06:19Z

1

You can just use continue:

for line in file:
    m = re.match(re1, line)
    if m:
       do stuff
       continue

    m = re.match(re2, line)
    if m:
       do stuff
       continue

    raise BadLine

Another, less obvious, option is to have a function like this:

def match_any(subject, *regexes):
    for n, regex in enumerate(regexes):
        m = re.match(regex, subject)
        if m:
           return n, m
    return -1, None

and then:

for line in file:
    n, m = match_any(line, re1, re2)
    if n == 0:
       ....
    elif n == 1:
       ....
    else:
       raise BadLine

answered Jan 28, 2014 at 1:06

georg

216k57 gold badges324 silver badges401 bronze badges

1 Comment

CarlEdman Over a year ago

I like the continue solution and would use it more often if it wasn't for the common case that I often need to do something with every valid line after matching.

Collectives™ on Stack Overflow

Assign and Test Regex in Python?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related