1

I am working on a personal project to analyze COVID19 data. Presently, I am download the excel sheet provided by ourworldindata.org, available at this url -> https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx

However, when i try to execute the command in pandas (below), I get a list of errors. What could be the root cause ?

url = 'https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx'
df = pd.read_excel(url, sheet_name='Sheet1')

Error

    Traceback (most recent call last):   File "<input>", line 1, in <module>   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 824, in __init__
self._reader = self._engines[engine](self._io)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_xlrd.py", line 21, in __init__
super().__init__(filepath_or_buffer)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_base.py", line 351, in __init__
self.book = self.load_workbook(filepath_or_buffer)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\pandas\io\excel\_xlrd.py", line 34, in load_workbook
return open_workbook(file_contents=data)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows,   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 92, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 1278, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])   File "C:\Users\masoom.kumar\PycharmProjects\ReadingINCA_Data\venv\lib\site-packages\xlrd\book.py", line 1272, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg) xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n\n\n<!D'

Please not that pandas can read the excel if I download it on my computer

2
  • 1
    you have to do it with requests.get().content Commented Jun 4, 2020 at 12:30
  • download raw file , url=url.replace('blob','raw') Commented Jun 4, 2020 at 12:43

2 Answers 2

4

Try the link to raw excel file:

import pandas as pd
url='https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx?raw=true'
df=pd.read_excel(url, sheet_name='Sheet1')
Sign up to request clarification or add additional context in comments.

Comments

1

You can do it with requests

import pandas as pd
import io
import requests

url = 'https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data.xlsx'

get_content = requests.get(url).content

df = pd.read_csv(io.StringIO(get_content .decode('utf-8')))

I do this to avoid using local drive or google drive , and saves time of connection.

1 Comment

Thanks for the response. This presented some errors, however the method suggested by @luigigi worked fine

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.