Getting empty list when scraping web page content using xpath

Question

When I try to retrieve some data using xpath from the url in the following code I get an empty list:

from lxml import html
import requests

if __name__ == '__main__':
    url = 'https://www.leagueofgraphs.com/champions/stats/aatrox'

    page = requests.get(url)
    tree = html.fromstring(page.content)

    # XPath to get the XP
    print(tree.xpath('//*[@id="graphDD1"]/text()'))

>>> []

What I expect is a string value like this one:

>>> ['
        5.0%    ']

Dan-di-Lion · Accepted Answer · 2021-11-10 13:46:58Z

This is because the xpath element that you are searching for is within some JavaScript.

You will need to find out the cookie which is generated after the JavaScript has been called so that you can make the same call to the URL.

Go to the 'Network' page of the Dev Console
Find the difference in the request header after abg_lite.js has run (mine was cookie: __cf_bm=TtnYbPlIA0J_GOhNj2muKa1pi8pU38iqA3Yglaua7q8-1636535361-0- AQcpStbhEdH3oPnKSuPIRLHVBXaqVwo+zf6d3YI/rhmk/RvN5B7OaIcfwtvVyR0IolwcoCk4ClrSvbBP4DVJ 70I=)
Add the cookie to your request

from lxml import html
import requests

if __name__ == '__main__':
    url = 'https://www.leagueofgraphs.com/champions/stats/aatrox'

    # Create a session to add cookies and headers to
    s = requests.Session()

    # After finding the correct cookie, update your sessions cookie jar
    # add your own cookie here
    s.cookies['cookie'] = '__cf_bm=TtnYbPlIA0J_GOhNj2muKa1pi8pU38iqA3Yglaua7q8-1636535361-0-'
'AQcpStbhEdH3oPnKSuPIRLHVBXaqVwo+zf6d3YI/rhmk/RvN5B7OaIcfwtvVyR0IolwcoCk4ClrSvbBP4DVJ70I='

    # Update headers to spoof a regular browser; this may not be necessary
    # but is good practice to bypass any basic bot detection
    s.headers.update({
                'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
' AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36'
            })

    page = s.get(url)
    tree = html.fromstring(page.content)

    # XPath to get the XP
    print(tree.xpath('//*[@id="graphDD1"]/text()'))

The following output is achieved: -

['\r\n 5.0% ']

Collectives™ on Stack Overflow

Getting empty list when scraping web page content using xpath

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related