0

I have trouble scraping the homepage http://www.jobs.ch which is programmed with react.js. I want to put the term Business in the search box and execute the search. Dryscrape worked for another example which was not a react.js page.

How can I write the term Business in this search field?

The error message when my script is executed:

ubuntu@ubuntu:~/scripts$ python jobs.py
Traceback (most recent call last):
  File "jobs.py", line 30, in <module>
    name.set("Business")
AttributeError: 'NoneType' object has no attribute 'set'

Here is my script:

#We will write a Python script to visit a webpage. Fill in the form and   submit the form.
#!/usr/bin/env python
# -*- coding:utf-8 -*-

import dryscrape

# make sure you have xvfb installed
dryscrape.start_xvfb()

root_url = 'http://www.jobs.ch/en/vacancies/'

if __name__ == '__main__':
# set up a web scraping session
session = dryscrape.Session(base_url = root_url)

# we don't need images
session.set_attribute('auto_load_images', False)

session.set_header('User-agent', 'Google Chrome')

# visit exact webpage which is the form in this example
session.visit('http://www.jobs.ch/en/vacancies/')

# fill in the form by taking ID of field from webdev tool
#name = session.at_xpath('//*[@data-reactid="107]')
name = session.at_xpath('//*[@data-reactid="107"]//*[@class="search-input col-sm-4 col-md-5"]')

name.set("Business")

# submit form
name.form().submit()

# save a screenshot of the web page
session.render("jobs.png")
print("Session rendered successfully!")

1 Answer 1

1

I think your xpath has an issue but apart from that, your session itself has been configured incorrectly.

This line

session = dryscrape.Session(base_url = root_url)

sets the base of the URL to your root_url so when you do session.visit('http://www.jobs.ch/en/vacancies/') you are in fact visiting the concatenation of your root_url and the URL provided in session.visit.

If you print session.url() you would be able to see that the URL you actually visited was http://www.jobs.ch/en/vacancies/http://www.jobs.ch/en/vacancies/

The xpath of the page which I got from Chrome -> Inspect -> Right Click -> Copy XPath is //*[@id="react-root"]/div/div[1]/div/div[2]/div/div[3]/div[2]/div/div/div/div/div[2]/div/div[1]/div/input

Please verify that you are using the correct xpath.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.