2

I'm struggling to do multiline regex with multiple matches.

I have data separated by newline/linebreaks like below. My pattern matches each of these lines if i test it separately. How can i match all the occurrences (specifically numbers?

I've read that i could/should use DOTALL somehow (possibly with MULTILINE). This seems to match any character (newlines also) but not sure of any eventual side effects. Don't want to have it match an integer or something and give me malformed data in the end. Any info on this would be great.

What i really need though, is some assistance in making this example code work. I only need to fetch the numbers from the data.

I used re.fullmatch when i only needed one specific match in a previous case and not entirely sure which function i should use now by the way (finditer, findall, search etc.).

Thank you for any and all help :)

data = """http://store.steampowered.com/app/254060/
http://www.store.steampowered.com/app/254061/
https://www.store.steampowered.com/app/254062
store.steampowered.com/app/254063
254064"""

regPattern = '^\s*(?:https?:\/\/)?(?:www\.)?(?:store\.steampowered\.com\/app\/)?([0-9]+)\/?\s*$'

evaluateData = re.search(regPattern, data, re.DOTALL | re.MULTILINE)
if evaluateString2 is not None:
    print('do stuff')
else:
    print('found no match')

3 Answers 3

3
import re
p = re.compile(ur'^\s*(?:https?:\/\/)?(?:www\.)?(?:store\.steampowered\.com\/app\/)?([0-9]+)\/?\s*$', re.MULTILINE)
test_str = u"http://store.steampowered.com/app/254060/\nhttp://www.store.steampowered.com/app/254061/\nhttps://www.store.steampowered.com/app/254062\nstore.steampowered.com/app/254063\n254064"

re.findall(p, test_str)

https://regex101.com/r/rC9rI0/1

this gives [u'254060', u'254061', u'254062', u'254063', u'254064'].

Are you trying to return those specific integers?

Sign up to request clarification or add additional context in comments.

3 Comments

Yes, i was just trying to return those integers. I can rebuild the url later if i wish.
For reference: I just plugged in your regex and data into regex101.com, then I added g and m modifiers, then I hit generate code and copy-pasted the python part. It is a very useful tool for beginner regex programmers.
Oh man. I was actually using that site to build the regex pattern and i was looking around for modifiers similar to those but never found them... >_< Nice feature to autogenerate the code as well. Definately bookmarking, thanks :)
1

re.search stop at the first occurrence

You should use this intead

re.findall(regPattern, data, re.MULTILINE) ['254060', '254061', '254062', '254063', '254064']

Note: Search was not working for me (python 2.7.9). It just return the first line of data

1 Comment

Strange, i was sure i had tried that but recieved some obscure error. Thank you :)
1

/ has no special meaning so you do not have to escape it (and in not-raw strings you would have to escape every \)

try this

regPattern = r'^\s*(?:https?://)?(?:www\.)?(?:store\.steampowered\.com/app/)?([0-9]+)/?\s*$'

1 Comment

Thank you, i will change my pattern :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.