Why Read_HTML from Python Pandas not working?

Question

I would like to use Python Pandas Read_HTML() function to scrape the information from Yahoo Finance table, seen in the screenshot, bordered in red.

However, I received a HTTPError: HTTP Error 404: Not Found

Here is my code output:

!pip install pandas
!pip install requests
!pip install bs4
!pip install requests_html
!pip install pytest-astropy
!pip install nest_asyncio
!pip install plotly

import pandas as pd
from bs4 import BeautifulSoup
import requests
import requests_html
import nest_asyncio
import lxml
import html5lib
nest_asyncio.apply()

url_link = "https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27"
read_html_pandas_data = pd.read_html(url_link)

Md. Fazlul Hoque · Accepted Answer · 2022-09-18 22:55:45Z

8

Try as follows:

import pandas as pd
import requests
url_link = 'https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27'
r = requests.get(url_link,headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
read_html_pandas_data = pd.read_html(r.text)[0]
print(read_html_pandas_data)

edited Sep 18, 2022 at 22:55

answered Jul 5, 2021 at 1:45

Md. Fazlul Hoque

16.2k5 gold badges15 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

TropicalMagic Over a year ago

Hi, thanks for the response! However, I received this output: [ 0 0 Will be right back... Thank you for your pati...]

Md. Fazlul Hoque Over a year ago

just add User-Agent

Md. Fazlul Hoque Over a year ago

Now you will get data

Md. Fazlul Hoque Over a year ago

If you want to access the site's data , then it's requirement is to get your real identification that's why you have to inject user-agent as header. Thanks

TropicalMagic Over a year ago

Brilliant, thanks! I got output, but how do I transpose the rows to columns?

QHarr · Accepted Answer · 2021-07-05 01:40:08Z

2

Because an user-agent header is needed which can't be specified with read_html. You could grab table first with requests, specifying the appropriate header, then handover to pandas:

from pandas import read_html as rh
import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
table = rh(str(soup.select_one('[data-test="historical-prices"]')))[0]
print(table)

answered Jul 5, 2021 at 1:40

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

2 Comments

TropicalMagic Over a year ago

Hi, thanks for the response! Is there a way to transpose the rows to columns and place them in a DataFrame? Here is the current output: Date \ 0 Jul 02, 2021 1 Jul 01, 2021 2 Jun 30, 2021

QHarr Over a year ago

table is already a dataframe. That has a transpose() method. pandas.pydata.org/pandas-docs/stable/reference/api/…

Collectives™ on Stack Overflow

Why Read_HTML from Python Pandas not working?

2 Answers 2

5 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related