Python regex to match multiple times

Question

I'm trying to match a pattern against strings that could have multiple instances of the pattern. I need every instance separately. re.findall() should do it but I don't know what I'm doing wrong.

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)
match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')

I need 'http://url.com/123', http://url.com/456 and the two numbers 123 & 456 to be different elements of the match list.

I have also tried '/review: ((http://url.com/(\d+)\s?)+)/' as the pattern, but no luck.

just remove the review: portion as the second http won't have that before it. — abc123
– abc123, Commented Jul 1, 2013 at 15:07
yes but I need that there, it's part of the regex. I don't need ANY url there, just the ones following the string 'review:' — mavili
– mavili, Commented Jul 1, 2013 at 15:08

Narendra Yadala · Accepted Answer · 2013-07-01 15:15:50Z

26

Use this. You need to place 'review' outside the capturing group to achieve the desired result.

pattern = re.compile(r'(?:review: )?(http://url.com/(\d+))\s?', re.IGNORECASE)

This gives output

>>> match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')
>>> match
[('http://url.com/123', '123'), ('http://url.com/456', '456')]

answered Jul 1, 2013 at 15:15

Narendra Yadala

9,6641 gold badge31 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

mavili Over a year ago

that does the job, thanks! the ? after (?:review ) is also critical as it didn't give me all matches without it. ;)

Rambatino Over a year ago

Don't forget to import re

user2340939 Over a year ago

What about doing it without findall? I.e. for cases when you need to match many cases, but this would be just a part of a match?

nogjam Over a year ago

regex101.com is a neat tool for testing this out.

John Montgomery · Accepted Answer · 2013-07-01 15:09:43Z

6

You've got extra /'s in the regex. In python the pattern should just be a string. e.g. instead of this:

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)

It should be:

pattern = re.compile('review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

Also typically in python you'd actually use a "raw" string like this:

pattern = re.compile(r'review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

The extra r on the front of the string saves you from having to do lots of backslash escaping etc.

answered Jul 1, 2013 at 15:09

John Montgomery

9,1084 gold badges37 silver badges44 bronze badges

Comments

til_b · Accepted Answer · 2013-07-01 15:15:17Z

2

Use a two-step approach: First get everything from "review:" to EOL, then tokenize that.

msg = 'this is the message. review: http://url.com/123 http://url.com/456'

review_pattern = re.compile('.*review: (.*)$')
urls = review_pattern.findall(msg)[0]

url_pattern = re.compile("(http://url.com/(\d+))")
url_pattern.findall(urls)

answered Jul 1, 2013 at 15:15

til_b

3355 silver badges16 bronze badges

Collectives™ on Stack Overflow

Python regex to match multiple times

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest