0

I am very new to python and trying to do my own data analysis.

I am trying to parse data from this website: https://www.tsn.ca/nhl/statistics

I wanted to get the table in a data frame format.

I tried this:

import pandas as pd

players_list_unclean = pd.read_html('https://www.sportsnet.ca/hockey/nhl/players/?season=2021&?seasonType=reg&tab=Skaters')

I get the following error:

raise ValueError("No tables found") ValueError: No tables found

I can see there is table, but for some reason it is not being read.

I found another stack overflow solution recommending using selenium:

pandas read_html ValueError: No tables found

However, when I tried to implement this code I could not find the table ID in the html page source. Does anyone know another way to do this? I have tried other websites, but I ultimately have the same issue.

from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")

head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')

list_rows = []

for items in body.find_element_by_tag_name('tr'):
    list_cells = []
    for item in items.find_elements_by_tag_name('td'):
        list_cells.append(item.text)
    list_rows.append(list_cells)
driver.close() ```



2 Answers 2

2

If you right click the table and choose inspect, you will see that the "table" on that page is not actually using the html table element.

From the Pandas documentation:

This function searches for <table> elements and only for <tr> and <th> rows and <td> elements within each <tr> or <th> element in the table.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html

I don't think this will work on this page. Probably need to find another data source.

Sign up to request clarification or add additional context in comments.

Comments

0

There's no table but you're in luck because the data is coming from a fetch:

https://datacrunch.9c9media.ca/statsapi/sports/hockey/leagues/nhl/sortablePlayerSeasonStats/skater?brand=tsn&type=json&seasonType=regularSeason&season=2021

6 Comments

Does that mean I just have to manually clean the data from the html file?
No, that's json data you parse it with json.loads
Ok, and how did you get this fetch data from the link you sent above?
The same way. With requests or any other way you're getting the html
Sorry I am not quite following. I was just using read_html. I am guessing you can't do that to get the fetch data?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.