Problems Webscraping A Javascript Table

Question

I'm new to webscraping and I'm trying to scrape the table from this website: https://www.eloratings.net/2016_European_Championship

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://www.eloratings.net/2016_European_Championship'
r = requests.get(url).text
soup = BeautifulSoup(r, "html.parser")
df = pd.read_html(str(soup.find_all('table')))

I get the "No tables found" error.

If I try to use an index to find the table:

df = pd.read_html(str(soup.find_all('table')[0]))

I get "List index out of range".

I have also tried using the Json package and Helium/Selenium webdrivers but I cannot make anything work.

it's a js table, requests get only the html response, without running the JS, it's not a browser, so your table is not loaded, you need to use something like scrapy or tkinker to get do fetch the html after running JS code — Mohammed Janati Idrissi
– Mohammed Janati Idrissi, Commented May 18, 2021 at 14:34
@MohammedJanatiIdrissi no you don't need tkinker nor selenium. — baduker
– baduker, Commented May 18, 2021 at 14:36

baduker · Accepted Answer · 2021-05-18 14:49:42Z

Use the endpoint to grab the .tsv response. Dump that to a file and then read it with pandas.

Here's how:

import time

import pandas as pd
import requests

url = f"https://www.eloratings.net/2016_European_Championship.tsv?={int(time.time())}"
table = requests.get(url).content
with open("table_data.tsv", "wb") as f:
    f.write(table)
df = pd.read_csv("table_data.tsv", sep="\t", header=None)
print(df)

Output:

     1   3  DE  2016  1.1  2223   8  ...  532  201  185  2053  1090   −1   −19
0    2   4  FR  1983    1  2137  17  ...  389  253  168  1426  1171   +3    +8
1    3   6  PT  1959    2  2020  20  ...  269  171  133   935   685   +6   +50
2    4   8  IT  1950    1  2132   8  ...  412  157  216  1348   790   +7   +75
3    5   9  ES  1940    1  2165   7  ...  385  129  148  1295   605   −4   −53
4    6  11  EN  1913    1  2212   4  ...  595  191  234  2403  1012   −3   −58
5    7  14  BE  1891    4  1959  24  ...  311  282  160  1293  1250   −4   −32
6    8  16  HR  1849    5  2006  12  ...  148   55   76   493   272   +2   +31
7    9  18  PL  1824    2  2082  30  ...  348  262  200  1376  1105   +6   +58
8   10  19  TR  1816   10  1900  42  ...  209  215  125   739   802   −2   −13
9   11  20  CH  1803    9  1917  28  ...  263  341  168  1113  1336   +2   +31
10  12  23  WA  1779    3  1906  22  ...  196  302  134   790  1067  +24  +124
11  13  26  SK  1759   17  1774  39  ...  105   95   64   375   345   −2    −7
12  13  26  IE  1759    4  1918  22  ...  227  250  161   862  1050   +4    +1
13  15  31  IS  1742   27  1754  83  ...  123  208   77   485   695  +14   +76
14  16  32  UA  1739   15  1847  36  ...  103   64   65   319   228  −16   −98
15  17  33  SE  1735    2  2014  16  ...  492  294  217  2039  1341   −7   −29
16  18  37  HU  1723    1  2231  18  ...  445  286  200  1949  1397   +5   +27
17  19  38  RO  1719    5  1945  26  ...  310  212  172  1143   889   −9   −40
18  20  39  CZ  1718    1  2038  12  ...  371  226  172  1432   958   −8   −37
19  21  40  AT  1713    1  2067  20  ...  311  282  163  1365  1209  −19   −64
20  22  43  RU  1694    1  2080  22  ...  358  147  178  1203   661  −15   −66
21  23  51  EI  1642   14  1850  38  ...  138  249  131   536   812   +2   +18
22  24  53  AL  1634   33  1634  75  ...   76  165   66   274   485   +1   +17

[23 rows x 33 columns]

Thanks, this worked except it needed header = None in the read_csv().

Collectives™ on Stack Overflow

Problems Webscraping A Javascript Table

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related