I am trying to extract the numeric part only, in the example below 25709, and add it a variable, lets call that athleteID, that I can later add to a dynamic URL to iterate through and use to send a search request:
'<a href="../athletehistory/?athleteNumber=25709" target="_top">Zola Budd</a>'
I have a list of these URLs (or part URLs) stored in a list within a dataframe and I have iterated twice over this dataframe using the split('=') function and managed to get it to the point below.
i=[]
id_list=[]
for id in df2['athleteURL']:
i = id.split('\=')
id_list.append(i)
print(id_list)
Which then produces a list, one line as an example below:
'<a href', '"../athletehistory/?athleteNumber', '25709" target', '"_top">Zola Budd</a>'
I then did a second iteration using '('"')' and got it to the below:
id_list2=[]
for id2 in id_list[2]:
j = id2.split('\"')
id_list2.append(j)
#print(id_list2[2])
athleteIDnumber = id_list2[2]
print(athleteIDnumber)
['2967288', ' target']
However this is where I am now stuck as it appears to be one element within a list plus I am not sure this is the most efficient way to extract this line as I also struggled with using other regex functions.
Any advice or support would be appreciated. Thanks Chris