2

I'm trying to get the data from "https://fortune.com/global500/2019/search/" using python requests-html module. I'm able to get the 1st 100 items (from 1st page) because the page have javascript enabled. And we need to click on "next" to load the 2nd page, curretly i get only just the 1st 100 items.

While i click "next" on the browser the url is not changing on the address bar. So I'm clueless how to get the next pages using requests-html.

from requests_html import HTMLSession

def get_fortune500():
    companies = []
    url = 'https://fortune.com/global500/2019/search/'
    session = HTMLSession()
    r = session.get(url)
    r.html.render(wait=1, retries=2)
    table = r.html.find('div.rt-tbody', first=True)
    rows = table.find('div.rt-tr-group')
    for row in rows:
        row_data = []
        cells = row.find('div.rt-td')
        for cell in cells:
            celldata = cell.text.lstrip('$').replace(',', '')
            row_data.append(celldata)
        companies.append(row_data)
    return companies

fortune_list = get_fortune500()
print(fortune_list)
print(len(fortune_list))

I really appreciate your time.

5
  • requests is more for AJAX-type requests, not 'web scraping' or interacting with HTML, etc.. To programmatically click buttons, etc on websites using Python, your best bet is something like Selenium or Beautiful Soup Commented Dec 24, 2019 at 19:44
  • I just confirmed that site is using server side rendering, and not some API to grab that data, which means you will have to use Beautiful Soup or Selenium to extract data from the HTML - unfortunately, you cannot use requests in this scenario, as far as I can tell. These appear to be all of the parameters you can use in your queries, FYI. https://fortune.com/global500/2019/search/?name=walmart&sector=&industry=&hqcountry=&hqcity=&hqstate= Commented Dec 24, 2019 at 19:48
  • 1
    @MattOestreich Thank You. If its not too much ask do know have any examples. Commented Dec 24, 2019 at 19:50
  • 1
    Give me a minute so I can try and put something together for you Commented Dec 24, 2019 at 19:52
  • 1
    It looks like @JugrajSingh did some more due diligence on this and did in fact find the API they're using, which means you should be able to use requests. Commented Dec 24, 2019 at 20:05

2 Answers 2

4

Here is the list of 500 of all

https://content.fortune.com/wp-json/irving/v1/data/franchise-search-results?list_id=2666483

This website is storing the response of this API in browsers IndexedDB and after that only frontend takes control.

You can figure out the way to read That response from the first request.

Sign up to request clarification or add additional context in comments.

1 Comment

That was my 1st post on Stack overflow, I'm presently surprised with how quickly these smart people willing to help a stranger. I thought my question was stupid, and no one would care about it. WOW...this change my mind.. within 30 min i have a better solution than I thought. Now I feel stupid about the hours i wasted last night :D. Thank You So much.
0

Although you can do it just by navigating to the JSON is mentioned by @Jugraj but if you want to learn more about the requests-html you can always look for the official documentation of the requests-html.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.