Python comparing string against several regular expressions

Question

I'm pretty experienced with Perl and Ruby but new to Python so I'm hoping someone can show me the Pythonic way to accomplish the following task. I want to compare several lines against multiple regular expressions and retrieve the matching group. In Ruby it would be something like this:

# Revised to show variance in regex and related action.
data, foo, bar = [], nil, nil
input_lines.each do |line|
  if line =~ /Foo(\d+)/
    foo = $1.to_i
  elsif line =~ /Bar=(.*)$/
    bar = $1
  elsif bar
    data.push(line.to_f)
  end
end

My attempts in Python are turning out pretty ugly because the matching group is returned from a call to match/search on a regular expression and Python has no assignment in conditionals or switch statements. What's the Pythonic way to do (or think!) about this problem?

See stackoverflow.com/questions/2554185/match-groups-in-python. — PaulMcG
– PaulMcG, Commented Apr 13, 2010 at 23:30

Ignacio Vazquez-Abrams · Accepted Answer · 2010-04-13 22:55:31Z

1

Something like this, but prettier:

regexs = [re.compile('...'), ...]

for regex in regexes:
  m = regex.match(s)
  if m:
    print m.groups()
    break
else:
  print 'No match'

answered Apr 13, 2010 at 22:55

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

maerics Over a year ago

I tried something similar but I want to take different actions based on which regex matches, so I moved from a list to a dictionary mapping the regexs to lambdas to be called if a match is found but it makes for some confusing code...

Alex Martelli · Accepted Answer · 2010-04-13 23:12:14Z

There are several ways to "bind a name on the fly" in Python, such as my old recipe for "assign and test"; in this case I'd probably choose another such way (assuming Python 2.6, needs some slight changes if you're working with an old version of Python), something like:

import re
pats_marks = (r'^A:(.*)$', 'FOO'), (r'^B:(.*)$', 'BAR')
for line in lines:
    mo, m = next(((mo, m) for p, m in pats_mark for mo in [re.match(p, line)] if mo),
                 (None, None))
    if mo: print '%s: %s' % (m, mo.group(1))
    else: print 'NO MATCH: %s' % line

Many minor details can be adjusted, of course (for example, I just chose (.*) rather than (.*?) as the matching group -- they're equivalent given the immediately-following $ so I chose the shorter form;-) -- you could precompile the REs, factor things out differently than the pats_mark tuple (e.g., with a dict indexed by RE patterns), etc.

But the substantial ideas, I think, are to make the structure data-driven, and to bind the match object to a name on the fly with the subexpression for mo in [re.match(p, line)], a "loop" over a single-item list (genexps bind names only by loop, not by assignment -- some consider using this part of genexps' specs to be "tricky", but I consider it a perfectly acceptable Python idiom, esp. since it was considered back in the time when listcomps, genexps' "ancestors" in a sense, were being designed).

Community · Accepted Answer · 2017-05-23 10:26:56Z

0

Paul McGuire's solution of using an intermediate class REMatcher which performs the match, stores the match group, and returns a boolean for success/fail turned out to produce the most legible code for this purpose.

edited May 23, 2017 at 10:26

CommunityBot

11 silver badge

answered Apr 26, 2010 at 22:24

maerics

157k47 gold badges277 silver badges299 bronze badges

Comments

ghostdog74 · Accepted Answer · 2010-04-13 23:31:59Z

-1

your regex simply takes whatever is after the 3rd character onwards.

for line in open("file"):
    if line.startswith("A:"):
        print "FOO #{"+line[2:]+"}"
    elif line.startswith("B:"):
        print "BAR #{"+line[2:]+"}"
    else:
        print "No match"

answered Apr 13, 2010 at 23:31

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

2 Comments

moshez Over a year ago

nice way, but I'd use split and comparison: begin, rest = line.split(':', 1) if begin == "A": etc...

maerics Over a year ago

This is good but I'm looking for something more general, the simple regex is just for explanatory purposes, the actual regexs would be fairly complex.

Collectives™ on Stack Overflow

Python comparing string against several regular expressions

4 Answers 4

1 Comment

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related