3

I download a XLS file from the web using selenium.

I tried many options I found in stack-overflow and other websites to read the XLS file :

import pandas as pd
df = pd.read_excel('test.xls') # Read XLS file
Expected "little-endian" marker, found b'\xff\xfe'

And

df = pd.ExcelFile('test.xls').parse('Sheet1') # Read XLSX file
Expected "little-endian" marker, found b'\xff\xfe'

And again

from xlrd import open_workbook
book = open_workbook('test.xls') 
CompDocError: Expected "little-endian" marker, found b'\xff\xfe'

I have tried different encoding: utf-8, ANSII, utf_16_be, utf16 I have even tried to get the encoding of the file from notepad or other applications.

Type of file : Microsoft Excel 97-2003 Worksheet (.xls) I can open the file with Excel without any issue. What's frustrating is that if I open the file with excel and just press save I then can read the file with of the previous python command.

I would be really grateful if someone could provide me other ideas I could try. I need to open this file with a python script only.

Thanks, Max

Solution(Somewhat messy but simple) that could potentially work for any type of Excel file :

Called VBA from python to Open and save the file in Excel. Excel "clean-up" the file and then Python is able to read it with any read Excel type function

Solution inspired by @Serge Ballesta and @John Y comments.

## Open a file in Excel and save it to correct the encoding error 
import win32com.client
import pandas

downloadpath="c:\\firefox_downloads\\"
filename="myfile.xls"

xl=win32com.client.Dispatch("Excel.Application")
xl.Application.DisplayAlerts = False # disables Excel pop up message (for saving the file)
wb = xl.Workbooks.Open(Filename=downloadpath+filename)
wb.SaveAs(downloadpath+filename)
wb.Close
xl.Application.DisplayAlerts = True  # enables Excel pop up message for saving the file

df = pandas.ExcelFile(downloadpath+filename).parse('Sheet1') # Read XLSX file

Thank you all!

12
  • 1
    The file that you downloaded is probably not in XLS format, nor in UTF-8 CSV formats. But there are still tons of possible format and without knowing more about that file I really cannot guess... Commented Mar 9, 2018 at 16:17
  • I can open the file with Excel with no problem. It has a .xls extension when I download it. Type of file : Microsoft Excel 97-2003 Worksheet (.xls) The website I download the file from access a MySQL database to generate this excel file. i think the source code they used to do this is C# It has about 10 columns for 80 rows. I could try to send a sample Commented Mar 9, 2018 at 16:28
  • Excel can detect and load many formats. Have you a message about changing format when you open the original file and then save it back? Commented Mar 9, 2018 at 16:33
  • There isnt any message when I Save or Save As or Save As a .xlsx file. It just saves it and then I can open it using pandas.read_excel for example Commented Mar 9, 2018 at 16:44
  • As I have already said, without an example of the file (not its content), I cannot guess the format... Commented Mar 9, 2018 at 16:53

2 Answers 2

2

I got away from this one by using the following:

# Read content in bytes from whatever is your source, then:
content = content[:28]+b'\xFE\xFF'+content[30:]
with open('file.xls', 'wb') as file:
    file.write(content)

# This works now
df = pd.read_excel('file4.xls')

Based on this: https://github.com/python-excel/xlrd/blob/master/xlrd/compdoc.py#L90

Sign up to request clarification or add additional context in comments.

Comments

-2

What does pd mean?? What

pandas is made for data science. In my opinion, you have to use openpyxl (read and write only xlsx) or xlwt/xlrd (read xls... and write only xls).

from xlrd import open_workbook
book = open_workbook(<math file>)
sheet =.... 

It has several examples with this on Internet...

2 Comments

Thanks for the reply. I tried this already thought: from xlrd import open_workbook book = open_workbook('test.xls') CompDocError: Expected "little-endian" marker, found b'\xff\xfe'
CompDocError means your file is corrupted or something like this. Maybe first check if the file get to open correctly on excel. Otherwise, paste this file here, I am going to try on m'y side...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.