1

I struggled for a few hours how to read an excel file with pd.read_excel where the path is a website address. I figured out that the link doesn't go directly to the file but just triggers downloading. Is there any easy way to solve it?

Part of code:

link_energy = 'http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls'
df_energy = pd.read_excel(link_energy)

Error message:

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n<!DOC'

Probably it's not a problem of pandas but my lack of skills how do do it.

5
  • 1
    can you post the full url to the xslx file? Commented Feb 14, 2018 at 17:35
  • also, please check if the above post resolves your problem. Commented Feb 14, 2018 at 17:42
  • ok, I added full url but, comes form a platform coursera. Does't it explain anything? Commented Feb 14, 2018 at 18:40
  • I saw it and checked it @jp_data_analysis, it's not the case here Commented Feb 14, 2018 at 18:41
  • now it should be possible to check the file Commented Feb 14, 2018 at 19:44

2 Answers 2

1

For me works everything as expected in the following code:

import pandas as pd
link_energy = 'http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls'
df_energy = pd.read_excel(link_energy)
df_energy

without errors on the following env:

The version of the notebook server is: 5.2.2 The server is running on this version of Python:

Python 3.6.3 | packaged by conda-forge | (default, Nov 4 2017, 10:10:56) [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]

Current Kernel Information:

Python 3.6.3 | packaged by conda-forge | (default, Nov 4 2017, 10:10:56) Type 'copyright', 'credits' or 'license' for more information IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

Sign up to request clarification or add additional context in comments.

1 Comment

I can confirm, after updating pandas to the last version (don't know if it was needed) and installing xlrd (no need to import it), pd.read_excel works as expected.
0

However I am not having access to your url posted.

but pd.read_excel won't work and you need to use pd.read_csv

import pandas as pd

df = pd.read_csv('https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls')

Now you need to see the excel file what it contains what is the separator used, if there are any other values in any columns then it needs to be skipped in order to load and read useful data.

1 Comment

I added url which you can access. As I understand your answer would solve the problem if my case is really a .csv file

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.