Getting time out errors when downloading csv's using request api

Question

I previously wrote a program to analyze stock info and to get historical data I used NASDAQ. For example in the past if I wanted to pull a years worth of price quotes for CMG all I needed to do was make a request to the following link h_url= https://www.nasdaq.com/api/v1/historical/CMG/stocks/2020-06-30/2019-06-30 to download a csv of the historical quotes. However, now when I make the request I my connection times out and I cannot get any response. If I just enter the url into a web-browser it still downloads the file just fine. Some example code is below:


h_url= 'https://www.nasdaq.com/api/v1/historical/CMG/stocks/2020-06-30/2019-06-30'
page_response = rq.get(h_url, timeout=30)
page=bs(page_response.content, 'html.parser')

dwnld_fl=os.path.join(os.path.dirname(__file__),'Temp_data','hist_CMG_data.txt')
fl=open(dwnld_fl,'w')
fl.write(page.text)

Can someone please let me know if this works for them or if there is something I should do differently to get it to work again ? This is only an example not the actual code so if I accidentally made a simple syntax error you can assume the actual file is correct since it has worked without issue in the past.

gallen · Accepted Answer · 2020-07-02 00:28:31Z

You are missing the headers and making a request to an invalid URL (the file downloaded in a browser is empty).

import requests
from bs4 import BeautifulSoup as bs


h_url= 'https://www.nasdaq.com/api/v1/historical/CMG/stocks/2019-06-30/2020-06-30'
headers = {
    'authority': 'www.nasdaq.com',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'none',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-user': '?1',
    'sec-fetch-dest': 'document',
    'accept-language': 'en-US,en;q=0.9',
}

page_response = requests.get(h_url, timeout=30, allow_redirects=True, headers=headers)

with open("dump.txt", "w") as out:
    out.write(str(page_response.content))

This will result in writing a byte string to a the file "dump.txt" of the data received. You do not need to use BeautifulSoup to parse HTML, as the response is a text file, not HTML.

Collectives™ on Stack Overflow

Getting time out errors when downloading csv's using request api

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related