Converting scraped HTML table to Pandas dataframe

Question

I have a problem with converting html table to pandas dataframe. I have used BeautifulSoup for scraping, and now I want to convert that table to pandas dataframe with read_html function. But for some reason I get an error.

import pandas as pd
from bs4 import BeautifulSoup
import requests


headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

response = requests.get('https://en.wikipedia.org/wiki/Official_World_Golf_Ranking', headers = headers)
soup = BeautifulSoup(response.text, 'html.parser')


html_table = soup.find_all("table")[0]
print(html_table)
print(type(html_table))


df = pd.read_html(html_table)
print(df[0])

The error that I get is:

TypeError: 'NoneType' object is not callable

But html_table is <class 'bs4.element.Tag'>

Saurav Panda · Accepted Answer · 2020-06-04 08:12:10Z

2

Currently you are passing bs4 object to pandas, you should pass an html string.

Update the line with following code:

df = pd.read_html(str(html_table))

This should work for you!

answered Jun 4, 2020 at 8:12

Saurav Panda

5665 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

chitown88 · Accepted Answer · 2020-06-04 10:35:57Z

0

Simplify that even more as pandas' read_html() can take in the url

import pandas as pd

url = 'https://en.wikipedia.org/wiki/Official_World_Golf_Ranking'
df = pd.read_html(url)[0]

answered Jun 4, 2020 at 10:35

chitown88

29.1k6 gold badges34 silver badges67 bronze badges

Collectives™ on Stack Overflow

Converting scraped HTML table to Pandas dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related