Python Web-scraping, How to click 'Next' using Requests-HTML library

Question

I'm trying to get the data from "https://fortune.com/global500/2019/search/" using python requests-html module. I'm able to get the 1st 100 items (from 1st page) because the page have javascript enabled. And we need to click on "next" to load the 2nd page, curretly i get only just the 1st 100 items.

While i click "next" on the browser the url is not changing on the address bar. So I'm clueless how to get the next pages using requests-html.

from requests_html import HTMLSession

def get_fortune500():
    companies = []
    url = 'https://fortune.com/global500/2019/search/'
    session = HTMLSession()
    r = session.get(url)
    r.html.render(wait=1, retries=2)
    table = r.html.find('div.rt-tbody', first=True)
    rows = table.find('div.rt-tr-group')
    for row in rows:
        row_data = []
        cells = row.find('div.rt-td')
        for cell in cells:
            celldata = cell.text.lstrip('$').replace(',', '')
            row_data.append(celldata)
        companies.append(row_data)
    return companies

fortune_list = get_fortune500()
print(fortune_list)
print(len(fortune_list))

I really appreciate your time.

requests is more for AJAX-type requests, not 'web scraping' or interacting with HTML, etc.. To programmatically click buttons, etc on websites using Python, your best bet is something like Selenium or Beautiful Soup — Matt Oestreich
– Matt Oestreich, Commented Dec 24, 2019 at 19:44
I just confirmed that site is using server side rendering, and not some API to grab that data, which means you will have to use Beautiful Soup or Selenium to extract data from the HTML - unfortunately, you cannot use requests in this scenario, as far as I can tell. These appear to be all of the parameters you can use in your queries, FYI. https://fortune.com/global500/2019/search/?name=walmart&sector=&industry=&hqcountry=&hqcity=&hqstate= — Matt Oestreich
– Matt Oestreich, Commented Dec 24, 2019 at 19:48
@MattOestreich Thank You. If its not too much ask do know have any examples. — rpeter
– rpeter, Commented Dec 24, 2019 at 19:50
Give me a minute so I can try and put something together for you — Matt Oestreich
– Matt Oestreich, Commented Dec 24, 2019 at 19:52
It looks like @JugrajSingh did some more due diligence on this and did in fact find the API they're using, which means you should be able to use requests. — Matt Oestreich
– Matt Oestreich, Commented Dec 24, 2019 at 20:05

Jugraj Singh · Accepted Answer · 2019-12-24 20:02:06Z

4

Here is the list of 500 of all

https://content.fortune.com/wp-json/irving/v1/data/franchise-search-results?list_id=2666483

This website is storing the response of this API in browsers IndexedDB and after that only frontend takes control.

You can figure out the way to read That response from the first request.

answered Dec 24, 2019 at 20:02

Jugraj Singh

5491 gold badge9 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

rpeter Over a year ago

That was my 1st post on Stack overflow, I'm presently surprised with how quickly these smart people willing to help a stranger. I thought my question was stupid, and no one would care about it. WOW...this change my mind.. within 30 min i have a better solution than I thought. Now I feel stupid about the hours i wasted last night :D. Thank You So much.

AlixaProDev · Accepted Answer · 2021-08-13 10:19:05Z

0

Although you can do it just by navigating to the JSON is mentioned by @Jugraj but if you want to learn more about the requests-html you can always look for the official documentation of the requests-html.

answered Aug 13, 2021 at 10:19

AlixaProDev

5681 gold badge5 silver badges14 bronze badges

Collectives™ on Stack Overflow

Python Web-scraping, How to click 'Next' using Requests-HTML library

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related