I want to scrape information off this page: https://www.jobsbank.gov.sg/ICMSPortal/portlets/JobBankHandler/SearchDetail.do?id=JOB-2015-0321370
However, I have trouble parsing it using python. I am not sure what is the issue as I am not familiar with html. Could it be something to do with the shadow root I see in the html? If so, how do I get over it?
url = 'https://www.jobsbank.gov.sg/ICMSPortal/portlets/JobBankHandler/SearchDetail.do?id=JOB-2015-0321370'
hdr = {'User-Agent':'Mozilla/5.0'}
while True:
req = urllib2.Request(url,headers=hdr)
try:
page = urllib2.urlopen(req)
except:
print("Exception ConnectionError was caught, retrying requests...")
time.sleep(5)
else:
break
content = page.read()
tree = html.fromstring(content)
jobTitle = tree.xpath('//div[@class="jobDes"]/h3/text()')
Thanks.
Hello, I am a java script test analytics page