3

I would like to use Python Pandas Read_HTML() function to scrape the information from Yahoo Finance table, seen in the screenshot, bordered in red.

enter image description here

However, I received a HTTPError: HTTP Error 404: Not Found

Here is my code output:

!pip install pandas
!pip install requests
!pip install bs4
!pip install requests_html
!pip install pytest-astropy
!pip install nest_asyncio
!pip install plotly

import pandas as pd
from bs4 import BeautifulSoup
import requests
import requests_html
import nest_asyncio
import lxml
import html5lib
nest_asyncio.apply()

url_link = "https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27"
read_html_pandas_data = pd.read_html(url_link)

2 Answers 2

8

Try as follows:

import pandas as pd
import requests
url_link = 'https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27'
r = requests.get(url_link,headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
read_html_pandas_data = pd.read_html(r.text)[0]
print(read_html_pandas_data)
Sign up to request clarification or add additional context in comments.

5 Comments

Hi, thanks for the response! However, I received this output: [ 0 0 Will be right back... Thank you for your pati...]
just add User-Agent
Now you will get data
If you want to access the site's data , then it's requirement is to get your real identification that's why you have to inject user-agent as header. Thanks
Brilliant, thanks! I got output, but how do I transpose the rows to columns?
2

Because an user-agent header is needed which can't be specified with read_html. You could grab table first with requests, specifying the appropriate header, then handover to pandas:

from pandas import read_html as rh
import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://finance.yahoo.com/quote/NFLX/history?p=NFLX%27', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
table = rh(str(soup.select_one('[data-test="historical-prices"]')))[0]
print(table)

2 Comments

Hi, thanks for the response! Is there a way to transpose the rows to columns and place them in a DataFrame? Here is the current output: Date \ 0 Jul 02, 2021 1 Jul 01, 2021 2 Jun 30, 2021
table is already a dataframe. That has a transpose() method. pandas.pydata.org/pandas-docs/stable/reference/api/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.