0

I have web scraping using python that gets this code from the site:

<a href="javascript:document.frmMain.action.value='display_physician_info';document.frmMain.PhysicianID.value=1234567;document.frmMain.submit();" title="For more information, click here.">JOHN, DOE</a>

I want to parse the specific value of href like the value of PhysicianID which is 1234567 inside "document.frmMain.PhysicianID.value"

Currently I'm getting the whole href text something like this:

for i in soup.select('.data'):
     name = i.find('a', attrs = {'title': 'For more information, click here.'})

Any idea? Thanks in advance.

2 Answers 2

1

Getting in href itself is easy with BeautifulSoup once you've got the link itself:

href = name['href']

Then you can use regex with the re module:

import re
match = re.search(r'document.frmMain.PhysicianID.value=\d*;', href).group()
value = re.search(r'\d+', match).group()
print(value) #prints 1234567

Putting it all together with your code:

import re
for i in soup.select('.data'):
    name = i.find('a', attrs = {'title': 'For more information, click here.'})
    match = re.search(r'document.frmMain.PhysicianID.value=\d*;', href).group()
    value = re.search(r'\d+', match).group()
    print(value) #prints 1234567
Sign up to request clarification or add additional context in comments.

Comments

1

Or without regex:

from bs4 import BeautifulSoup

content = """
<a href="javascript:document.frmMain.action.value='display_physician_info';document.frmMain.PhysicianID.value=1234567;document.frmMain.submit();" title="For more information, click here.">JOHN, DOE</a>
"""
soup = BeautifulSoup(content,"lxml")
item = soup.select_one("a")['href'].split("PhysicianID.value=")[1].split(";")[0]
print(item)

Output:

1234567

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.