7

I'm pretty experienced with Perl and Ruby but new to Python so I'm hoping someone can show me the Pythonic way to accomplish the following task. I want to compare several lines against multiple regular expressions and retrieve the matching group. In Ruby it would be something like this:

# Revised to show variance in regex and related action.
data, foo, bar = [], nil, nil
input_lines.each do |line|
  if line =~ /Foo(\d+)/
    foo = $1.to_i
  elsif line =~ /Bar=(.*)$/
    bar = $1
  elsif bar
    data.push(line.to_f)
  end
end

My attempts in Python are turning out pretty ugly because the matching group is returned from a call to match/search on a regular expression and Python has no assignment in conditionals or switch statements. What's the Pythonic way to do (or think!) about this problem?

2

4 Answers 4

1

Something like this, but prettier:

regexs = [re.compile('...'), ...]

for regex in regexes:
  m = regex.match(s)
  if m:
    print m.groups()
    break
else:
  print 'No match'
Sign up to request clarification or add additional context in comments.

1 Comment

I tried something similar but I want to take different actions based on which regex matches, so I moved from a list to a dictionary mapping the regexs to lambdas to be called if a match is found but it makes for some confusing code...
1

There are several ways to "bind a name on the fly" in Python, such as my old recipe for "assign and test"; in this case I'd probably choose another such way (assuming Python 2.6, needs some slight changes if you're working with an old version of Python), something like:

import re
pats_marks = (r'^A:(.*)$', 'FOO'), (r'^B:(.*)$', 'BAR')
for line in lines:
    mo, m = next(((mo, m) for p, m in pats_mark for mo in [re.match(p, line)] if mo),
                 (None, None))
    if mo: print '%s: %s' % (m, mo.group(1))
    else: print 'NO MATCH: %s' % line

Many minor details can be adjusted, of course (for example, I just chose (.*) rather than (.*?) as the matching group -- they're equivalent given the immediately-following $ so I chose the shorter form;-) -- you could precompile the REs, factor things out differently than the pats_mark tuple (e.g., with a dict indexed by RE patterns), etc.

But the substantial ideas, I think, are to make the structure data-driven, and to bind the match object to a name on the fly with the subexpression for mo in [re.match(p, line)], a "loop" over a single-item list (genexps bind names only by loop, not by assignment -- some consider using this part of genexps' specs to be "tricky", but I consider it a perfectly acceptable Python idiom, esp. since it was considered back in the time when listcomps, genexps' "ancestors" in a sense, were being designed).

Comments

0

Paul McGuire's solution of using an intermediate class REMatcher which performs the match, stores the match group, and returns a boolean for success/fail turned out to produce the most legible code for this purpose.

Comments

-1

your regex simply takes whatever is after the 3rd character onwards.

for line in open("file"):
    if line.startswith("A:"):
        print "FOO #{"+line[2:]+"}"
    elif line.startswith("B:"):
        print "BAR #{"+line[2:]+"}"
    else:
        print "No match"

2 Comments

nice way, but I'd use split and comparison: begin, rest = line.split(':', 1) if begin == "A": etc...
This is good but I'm looking for something more general, the simple regex is just for explanatory purposes, the actual regexs would be fairly complex.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.