Scraping a react.js webpage with dryscrape

Question

I have trouble scraping the homepage http://www.jobs.ch which is programmed with react.js. I want to put the term Business in the search box and execute the search. Dryscrape worked for another example which was not a react.js page.

How can I write the term Business in this search field?

The error message when my script is executed:

ubuntu@ubuntu:~/scripts$ python jobs.py
Traceback (most recent call last):
  File "jobs.py", line 30, in <module>
    name.set("Business")
AttributeError: 'NoneType' object has no attribute 'set'

Here is my script:

#We will write a Python script to visit a webpage. Fill in the form and   submit the form.
#!/usr/bin/env python
# -*- coding:utf-8 -*-

import dryscrape

# make sure you have xvfb installed
dryscrape.start_xvfb()

root_url = 'http://www.jobs.ch/en/vacancies/'

if __name__ == '__main__':
# set up a web scraping session
session = dryscrape.Session(base_url = root_url)

# we don't need images
session.set_attribute('auto_load_images', False)

session.set_header('User-agent', 'Google Chrome')

# visit exact webpage which is the form in this example
session.visit('http://www.jobs.ch/en/vacancies/')

# fill in the form by taking ID of field from webdev tool
#name = session.at_xpath('//*[@data-reactid="107]')
name = session.at_xpath('//*[@data-reactid="107"]//*[@class="search-input col-sm-4 col-md-5"]')

name.set("Business")

# submit form
name.form().submit()

# save a screenshot of the web page
session.render("jobs.png")
print("Session rendered successfully!")

Gaurav Ojha · Accepted Answer · 2017-03-27 11:10:46Z

1

I think your xpath has an issue but apart from that, your session itself has been configured incorrectly.

This line

session = dryscrape.Session(base_url = root_url)

sets the base of the URL to your root_url so when you do session.visit('http://www.jobs.ch/en/vacancies/') you are in fact visiting the concatenation of your root_url and the URL provided in session.visit.

If you print session.url() you would be able to see that the URL you actually visited was http://www.jobs.ch/en/vacancies/http://www.jobs.ch/en/vacancies/

The xpath of the page which I got from Chrome -> Inspect -> Right Click -> Copy XPath is //*[@id="react-root"]/div/div[1]/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/div/div[1]/div/input

Please verify that you are using the correct xpath.

answered Mar 27, 2017 at 11:10

Gaurav Ojha

1,1971 gold badge17 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scraping a react.js webpage with dryscrape

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related