1

I want to scrape information off this page: https://www.jobsbank.gov.sg/ICMSPortal/portlets/JobBankHandler/SearchDetail.do?id=JOB-2015-0321370

However, I have trouble parsing it using python. I am not sure what is the issue as I am not familiar with html. Could it be something to do with the shadow root I see in the html? If so, how do I get over it?

url = 'https://www.jobsbank.gov.sg/ICMSPortal/portlets/JobBankHandler/SearchDetail.do?id=JOB-2015-0321370'
hdr = {'User-Agent':'Mozilla/5.0'}
while True:
    req = urllib2.Request(url,headers=hdr)
    try:
        page = urllib2.urlopen(req)
    except:
        print("Exception ConnectionError was caught, retrying requests...")
        time.sleep(5)
    else:
        break
content = page.read()
tree = html.fromstring(content)

jobTitle = tree.xpath('//div[@class="jobDes"]/h3/text()')

Thanks.

2
  • What problem are you getting? Commented Sep 4, 2015 at 8:58
  • Are you getting the correct html, or is it blocking you for using a scraper? I tried it and after a couple of attempts it started to return a page saying Hello, I am a java script test analytics page Commented Sep 4, 2015 at 9:04

1 Answer 1

1

You can't scrape the desired job description content because, as you suggest, it is part of an <iframe> tag. The content of the iframe is set using JavaScript just after the page loads, and is therefore not returned as part of your page = urllib2.urlopen(req) request. To scrape content from an iFrame you will need to use a browser automation module such as Selenium http://docs.seleniumhq.org/docs/03_webdriver.jsp

Sign up to request clarification or add additional context in comments.

2 Comments

I was afraid that I need to use Selenium (not familiar with it). But thanks for your answer.
Selenium takes a little bit of learning but is OK once you are up and running. The next problem will then be to tackle 'headless browsing' - so that you can automate a browser without it actually displaying on your screen.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.