0

I have a problem with converting html table to pandas dataframe. I have used BeautifulSoup for scraping, and now I want to convert that table to pandas dataframe with read_html function. But for some reason I get an error.

import pandas as pd
from bs4 import BeautifulSoup
import requests


headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

response = requests.get('https://en.wikipedia.org/wiki/Official_World_Golf_Ranking', headers = headers)
soup = BeautifulSoup(response.text, 'html.parser')


html_table = soup.find_all("table")[0]
print(html_table)
print(type(html_table))


df = pd.read_html(html_table)
print(df[0])

The error that I get is:

TypeError: 'NoneType' object is not callable

But html_table is <class 'bs4.element.Tag'>

2 Answers 2

2

Currently you are passing bs4 object to pandas, you should pass an html string.

Update the line with following code:

df = pd.read_html(str(html_table))

This should work for you!

Sign up to request clarification or add additional context in comments.

Comments

0

Simplify that even more as pandas' read_html() can take in the url

import pandas as pd

url = 'https://en.wikipedia.org/wiki/Official_World_Golf_Ranking'
df = pd.read_html(url)[0]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.