1

I am trying to extract the numeric part only, in the example below 25709, and add it a variable, lets call that athleteID, that I can later add to a dynamic URL to iterate through and use to send a search request:

'<a href="../athletehistory/?athleteNumber=25709" target="_top">Zola Budd</a>'

I have a list of these URLs (or part URLs) stored in a list within a dataframe and I have iterated twice over this dataframe using the split('=') function and managed to get it to the point below.

 i=[]
 id_list=[]
 for id in df2['athleteURL']:
     i = id.split('\=')
     id_list.append(i)
 print(id_list)

Which then produces a list, one line as an example below:

 '<a href', '"../athletehistory/?athleteNumber', '25709" target', '"_top">Zola Budd</a>'

I then did a second iteration using '('"')' and got it to the below:

 id_list2=[]


 for id2 in id_list[2]:
     j = id2.split('\"')
     id_list2.append(j)

 #print(id_list2[2])

 athleteIDnumber = id_list2[2]
 print(athleteIDnumber)

 ['2967288', ' target']

However this is where I am now stuck as it appears to be one element within a list plus I am not sure this is the most efficient way to extract this line as I also struggled with using other regex functions.

Any advice or support would be appreciated. Thanks Chris

0

1 Answer 1

1
from urllib.parse import urlparse, parse_qs
from bs4 import BeautifulSoup

spam = '<a href="../athletehistory/?athleteNumber=25709" target="_top">Zola Budd</a>'

def get_athlete_number(html):
    soup = BeautifulSoup(html, 'html.parser')
    href = soup.find('a').get('href')
    return parse_qs(urlparse(href).query).get('athleteNumber', [None])[0]

print(get_athlete_number(spam))

output

25709

Use bs4 to parse the html and get the url. Use urllib.parse from standard library to parse the url. Define a function and apply it to column with the html values. Note that the function returns str

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you soooo much, saved me pulling out the last bits of my hair! :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.