How to scrape all rows from a dynamic table in html using python

Question

Here's link for scraping : http://5000best.com/websites/Games/

I tried almost everything I can. I'm a beginner in web scraping.

My code :

import requests
from  urllib.request import  urlopen
from urllib.error import  HTTPError
from urllib.error import  URLError
from bs4 import  BeautifulSoup
import pandas as pd
import csv


try:
    html = urlopen("http://5000best.com/websites/Games/")

except HTTPError as e:
    print(e)

except URLError as u:
    print(u)

else:
    soup = BeautifulSoup(html,"html.parser")
    table = soup.findAll('div',{"id":"content"})[0]
    tr = table.findAll(['tr'])[0:]
    csvFile = open('games.csv','wt', newline='',encoding='utf-8')
    writer = csv.writer(csvFile)
    try:   
        for cell in tr:
            th = cell.find_all('th')
            th_data = [col.text.strip('\n') for col in th]
            td = cell.find_all('td')
            row = [i.text.replace('\n','') for i in td]
            writer.writerow(th_data+row)      

    finally:   
        csvFile.close()

This code only scrape the first page of the table... I want all the pages. I inspected the web page but I didn't saw any url changes while toggling the page numbers, So it's completely dynamic.

αԋɱҽԃ αмєяιcαη · Accepted Answer · 2020-05-14 04:48:44Z

2

You can read it directly using pandas.read_html() function as a DataFrame which will do it easily for you.

import pandas as pd


def main(url):
    for item in range(1, 4):
        df = pd.read_html(url.format(item))[1]
        print(df)


main("http://5000best.com/websites/Games/{}/")

Sample of output:

CSV edit:

import pandas as pd


def main(url):
    for item in range(1, 4):
        df = pd.read_html(url.format(item))[1]
        print(f"Saving Page {item}")
        df.to_csv(f"page{item}.csv", index=False)


main("http://5000best.com/websites/Games/{}/")

Code updated for single DataFrame:

import pandas as pd


def main(url):
    goal = []
    for item in range(1, 4):
        df = pd.read_html(url.format(item))[1]
        goal.append(df)
    final = pd.concat(goal)
    print(final)


main("http://5000best.com/websites/Games/{}/")

edited May 14, 2020 at 4:48

answered May 11, 2020 at 7:52

αԋɱҽԃ αмєяιcαη

11.6k3 gold badges23 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Hemant Sah Over a year ago

It's an easy way though... but it will scrape only first page of the table. As the table is paginated, I'm looking for a way to scrape all the paginated table data.

αԋɱҽԃ αмєяιcαη Over a year ago

@HemantSah seems you didn't tried to run the code yet. be informed it's will paginate all tables. that's why i made a loop !

Humayun Ahmad Rajib Over a year ago

@ αԋɱҽԃ αмєяιcαη , The more I see your solution the more fascinated I am.

αԋɱҽԃ αмєяιcαη Over a year ago

@HumayunAhmadRajib glad to help :)

αԋɱҽԃ αмєяιcαη Over a year ago

@HemantSah I've updated the answer for you to save the tables within csv files.

|

AKX · Accepted Answer · 2020-05-11 07:30:35Z

0

Looking at the network inspector for that page reveals that it makes requests to

when you change pages. You may want to just scrape those instead.

answered May 11, 2020 at 7:30

AKX

171k16 gold badges146 silver badges228 bronze badges

1 Comment

Hemant Sah Over a year ago

Is there any way to store all these links into list or dictionary automatically without inspecting every table.

V S Sreejeet · Accepted Answer · 2020-05-11 07:33:32Z

0

Let me try to help you understand.

Have you used the developer tools in your browser? Open that (Use F12 or right click > inspect element) and select the network tab. Now while keeping the tab open, click on the next page link. A request shows up in the Network Tab.

This is what you are looking for. All dynamic thing on a web page can be viewed here.

Hope this helps you learn something. Cheers!

answered May 11, 2020 at 7:33

V S Sreejeet

12 bronze badges

3 Comments

Hemant Sah Over a year ago

Is there any way to automate this to store all these links into list or dictionary. I want to scrape the table for all categories i.e. Games, Commerce, Music etc. OR do I have to store it manually..?

V S Sreejeet Over a year ago

You can run 2 loops, one for each category, then inside this loop, run an iterative loop for each page. If the page you hit has no links at all, you may break the inner loop as this means you have reached the end of that category.

Hemant Sah Over a year ago

I didn't get it... I want to make a list of all these dynamic links from every category. Is this possible... OR do I have to store all these links by visiting every table by myself.

Collectives™ on Stack Overflow

How to scrape all rows from a dynamic table in html using python

3 Answers 3

10 Comments

1 Comment

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

10 Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related