Python lxml xpath returns no output

Question

I try to scrape a specific element on a website using lxml in Python. Below you can find my code, but there is no output.

    from lxml import html

    webpage = 'http://www.funda.nl/koop/heel-nederland/'
    page = requests.get(webpage)
    tree = html.fromstring(page.content)

    content = '//*[@id="content"]/form/div[2]/div[5]/div/a[8]/text()'
    content = str(tree.xpath(content))
    print content

Mirek Długosz · Accepted Answer · 2017-04-29 21:05:19Z

It looks that website you are attempting to scrap does not like to be scrapped. They utilize various techniques to detect if request comes from legitimate user or from bot and block access if they think it comes from bot. That's why your xpath does not find anything and that's why you should reconsider whatever you are doing.

If you decide that you want to continue, then the simplest way of fooling this particular website seems to be adding cookies to your requests.

First, obtain cookie string using you real browser:

Open new tab
Open developers tools
Go to "Network" tab in developer tools
If network tab is empty, refresh page
Find request to heel-nederland/ and click it
In Request Headers, you will find cookie string - it is quite long and contains many seemingly-random characters. Copy it

Then, modify your program to use these cookies:

import requests
from lxml import html

webpage = 'http://www.funda.nl/koop/heel-nederland/'
headers = {
        'Cookie': '<string copied from browser>'
        }
page = requests.get(webpage, headers=headers)
tree = html.fromstring(page.content)

selector = '//*[@id="content"]/form/div[2]/div[5]/div/a[8]/text()'
content = str(tree.xpath(selector))
print content

Collectives™ on Stack Overflow

Python lxml xpath returns no output

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related