How to extract multiple values from the same String with Regex in Python?

Question

I am currently trying to scrape some data from a webpage. The data I need is within the <meta> tag of the html source. Scraping the data and saving it to a String with BeautifulSoup is no problem.

The String contains 2 numbers I want to extract. Each of those numbers (review scores from 1-100) should be assigned to a distinct variable for further processing.

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"

The first value is 79/100 and the second is 86/100, but I only need 79 and 86. So far I have created a regex search to find those values and then .replace("/100") to clean things up.

But with my code, I only get the value for the first regex search match, which is 79. I tried getting the second value with m.group(1) but it doesn't work.

What am I missing ?

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"

m = re.search("../100", test_str)
if m:
    found = m.group(0).replace("/100","")
    print found

    # output -> 79

Thanks for your help.

Best regards!

Are you scraping the web page and then take the entire HTML source and apply regex to it? I'm asking because your code sample shows no beautifulsoup-related code. — Tomalak
– Tomalak, Commented May 21, 2017 at 10:35
Thanks! @Tomalak No I just save the data in a String using meta_description = soup.find("meta", {"name": "rating-data"}). I just didn't include the part of BeautifulSoup to keep things simple. — Alexander Scherer
– Alexander Scherer, Commented May 21, 2017 at 10:49

Ludisposed · Accepted Answer · 2017-05-21 10:50:38Z

3

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"    
m =  re.findall('(\d+(?=\/100))', test_str)
# m = ['79', '86']

I changed .. with /d+ so you can search for either 1 digit or 2

I also use a positive lookahead (?=...), so the .replace becomes unnecessary

Example at Regex101

edited May 21, 2017 at 10:50

answered May 21, 2017 at 10:35

Ludisposed

1,7794 gold badges20 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ludisposed Over a year ago

Np glad I could help :)

Shankara Narayana · Accepted Answer · 2020-11-23 02:31:37Z

3

I dont know why most people are not suggesting back references to a named group.

You can do something like below, syntax might not be perfect.

test_str = "<meta content=\"Overall Rating: 79/100 ... Some Info ... Score: 86/100 \"/>"

pattern = "^<meta content=\"Overall Rating: (?P<rating>.*?) ... Some Info ... (?P<score>.*?)$"

match = re.match(pattern, test_str)

match.group('rating')
match.group('score')

answered Nov 23, 2020 at 2:31

Shankara Narayana

7659 silver badges15 bronze badges

Collectives™ on Stack Overflow

How to extract multiple values from the same String with Regex in Python?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related