2

An example URL is 'http://www.hockey-reference.com/players/c/crosbsi01/gamelog/2016'

The table name I am trying to grab is named Regular Season.

What I use to do in previous instances was something like this...

import requests
from bs4 import *
from bs4 import NavigableString
import pandas as pd


url = 'http://www.hockey-reference.com/players/o/ovechal01/gamelog/2016'
resultsPage = requests.get(url)
soup = BeautifulSoup(resultsPage.text, "html5lib")
comment = soup.find(text=lambda x: isinstance(x, NavigableString) and "Regular Season  Table" in x)
df = pd.read_html(comment)

That's the type of approach I took to a site similar to this one, however, I'm unable to locate the table properly with this page. Not sure what I'm missing.

0

1 Answer 1

1

There is one table which you can get using the id:

import requests
from bs4 import BeautifulSoup


url = 'http://www.hockey-reference.com/players/o/ovechal01/gamelog/2016'
resultsPage = requests.get(url)
soup = BeautifulSoup(resultsPage.text, "html5lib")
table = soup.select_one("#gamelog")
print(table)

or using just pandas:

 df = pd.read_html(url, attrs = {'id': 'gamelog'})

Your code could never work as you are looking for a NavigableString which is inside a caption tag <caption>Regular Season Table</caption> not the table, you would need to call *.find_previous`* to get the table:

comment = soup.find(text=lambda x: isinstance(x, NavigableString) and "Regular Season  Table" in x)
table = comment.find_previous("table")

You could also use table = comment.parent.parent but find_previous is a better approach.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.