0

It's been a while since I've used regex, and I feel like this should be simple to figure out.

I have a web page full of links that looks like the string_to_match in the below code. I want to grab just the numbers in the links, like number "58" in the string_to_match. For the life of me I can't figure it out.

import re
string_to_match = '<a href="/ncf/teams/roster?teamId=58">Roster</a>'
re.findall('<a href="/ncf/teams/roster?teamId=(/d+)">Roster</a>',string_to_match)
2

2 Answers 2

1

Instead of using regular expressions, you can use a combination of HTML parsing (using BeautifulSoup parser) to locate the desired link and extract the href attribute value and URL parsing, which in this case, we'll use regular expressions for:

import re
from bs4 import BeautifulSoup

data = """
<body>
    <a href="/ncf/teams/roster?teamId=58">Roster</a>
</body>
"""

soup = BeautifulSoup(data, "html.parser")
link = soup.find("a", text="Roster")["href"]

print(re.search(r"teamId=(\d+)", link).group(1))

Prints 58.

Sign up to request clarification or add additional context in comments.

Comments

0

I would recommend using BeautifulSoup or lxml, it's worth the learning curve.

...But if you still want to use regexp

re.findall('href="[^"]*teamId=(\d+)',string_to_match)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.