Regex multiline syntax help (python)

Question

I'm struggling to do multiline regex with multiple matches.

I have data separated by newline/linebreaks like below. My pattern matches each of these lines if i test it separately. How can i match all the occurrences (specifically numbers?

I've read that i could/should use DOTALL somehow (possibly with MULTILINE). This seems to match any character (newlines also) but not sure of any eventual side effects. Don't want to have it match an integer or something and give me malformed data in the end. Any info on this would be great.

What i really need though, is some assistance in making this example code work. I only need to fetch the numbers from the data.

I used re.fullmatch when i only needed one specific match in a previous case and not entirely sure which function i should use now by the way (finditer, findall, search etc.).

Thank you for any and all help :)

data = """http://store.steampowered.com/app/254060/
http://www.store.steampowered.com/app/254061/
https://www.store.steampowered.com/app/254062
store.steampowered.com/app/254063
254064"""

regPattern = '^\s*(?:https?:\/\/)?(?:www\.)?(?:store\.steampowered\.com\/app\/)?([0-9]+)\/?\s*$'

evaluateData = re.search(regPattern, data, re.DOTALL | re.MULTILINE)
if evaluateString2 is not None:
    print('do stuff')
else:
    print('found no match')

Bryce Drew · Accepted Answer · 2016-07-28 15:54:57Z

3

import re
p = re.compile(ur'^\s*(?:https?:\/\/)?(?:www\.)?(?:store\.steampowered\.com\/app\/)?([0-9]+)\/?\s*$', re.MULTILINE)
test_str = u"http://store.steampowered.com/app/254060/\nhttp://www.store.steampowered.com/app/254061/\nhttps://www.store.steampowered.com/app/254062\nstore.steampowered.com/app/254063\n254064"

re.findall(p, test_str)

https://regex101.com/r/rC9rI0/1

this gives [u'254060', u'254061', u'254062', u'254063', u'254064'].

Are you trying to return those specific integers?

edited Jul 28, 2016 at 15:54

answered Jul 28, 2016 at 15:48

Bryce Drew

6,8991 gold badge18 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

raecer Over a year ago

Yes, i was just trying to return those integers. I can rebuild the url later if i wish.

Bryce Drew Over a year ago

For reference: I just plugged in your regex and data into regex101.com, then I added g and m modifiers, then I hit generate code and copy-pasted the python part. It is a very useful tool for beginner regex programmers.

raecer Over a year ago

Oh man. I was actually using that site to build the regex pattern and i was looking around for modifiers similar to those but never found them... >_< Nice feature to autogenerate the code as well. Definately bookmarking, thanks :)

user2740652 · Accepted Answer · 2016-07-28 15:57:52Z

1

re.search stop at the first occurrence

You should use this intead

re.findall(regPattern, data, re.MULTILINE) ['254060', '254061', '254062', '254063', '254064']

Note: Search was not working for me (python 2.7.9). It just return the first line of data

answered Jul 28, 2016 at 15:57

user2740652

3613 silver badges12 bronze badges

1 Comment

raecer Over a year ago

Strange, i was sure i had tried that but recieved some obscure error. Thank you :)

janbrohl · Accepted Answer · 2016-07-28 15:48:21Z

1

/ has no special meaning so you do not have to escape it (and in not-raw strings you would have to escape every \)

try this

regPattern = r'^\s*(?:https?://)?(?:www\.)?(?:store\.steampowered\.com/app/)?([0-9]+)/?\s*$'

answered Jul 28, 2016 at 15:48

janbrohl

2,6541 gold badge19 silver badges15 bronze badges

1 Comment

raecer Over a year ago

Thank you, i will change my pattern :)

Collectives™ on Stack Overflow

Regex multiline syntax help (python)

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related