Read XLSB File in Pandas Python

Question

There are many questions on this, but there has been no simple answer on how to read an xlsb file into pandas. Is there an easy way to do this?

No, I don't believe so. Look at this: github.com/pandas-dev/pandas/issues/8540. It's an open issue. You should look at converting it first, somehow. — cs95
– cs95, Commented Jul 10, 2017 at 19:07
That looks like a pretty old answer there. Was wondering if anything was added into the pandas package recently — Gayatri
– Gayatri, Commented Jul 11, 2017 at 15:29
Yeah.The issue is still open.For now, I guess I will need to convert it manually to an xlsx file and then read. — Gayatri
– Gayatri, Commented Jul 11, 2017 at 21:07

Glen Thompson · Accepted Answer · 2021-01-14 14:29:56Z

89

With the 1.0.0 release of pandas - January 29, 2020, support for binary Excel files was added.

import pandas as pd
df = pd.read_excel('path_to_file.xlsb', engine='pyxlsb')

Notes:

You will need to upgrade pandas - pip install pandas --upgrade
You will need to install pyxlsb - pip install pyxlsb

edited Jan 14, 2021 at 14:29

answered Feb 1, 2020 at 17:48

Glen Thompson

10.1k5 gold badges61 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

BossRoyce Over a year ago

getting ValueError: Unknown engine: pyxlsb. is this engine now built into pandas or do I have to install and import pyxlsb separately?

Glen Thompson Over a year ago

Need to install it pip3 install pyxlsb its not built in just supported ^^ Look at the notes in the answer.

BossRoyce Over a year ago

installed and imported pyxlsb. still getting ValueError: Unknown engine: pyxlsb. is there a trick to importing it?

Glen Thompson Over a year ago

What version of pandas do you have? pd.show_versions() You don't need to import it. My guess is that there is a mis match in what you installed and what you are running. e.g. did you install it in python2 and run python3 or vice versa? If you run print(pd.show_versions()) it should tell you what you are executing version wise.

Erik Johnsson Over a year ago

why my anaconda python 3's pandas can update to 0.25.1 only?

Finrod Felagund · Accepted Answer · 2020-02-05 20:25:32Z

37

Hi actually there is a way. Just use pyxlsb library.

import pandas as pd
from pyxlsb import open_workbook as open_xlsb

df = []

with open_xlsb('some.xlsb') as wb:
    with wb.get_sheet(1) as sheet:
        for row in sheet.rows():
            df.append([item.v for item in row])

df = pd.DataFrame(df[1:], columns=df[0])

UPDATE: as of pandas version 1.0 read_excel() now can read binary Excel (.xlsb) files by passing engine='pyxlsb'

Source: https://pandas.pydata.org/pandas-docs/version/1.0.0/whatsnew/v1.0.0.html

edited Feb 5, 2020 at 20:25

answered Mar 21, 2018 at 7:55

Finrod Felagund

1,3032 gold badges14 silver badges20 bronze badges

8 Comments

Gayatri Over a year ago

I was looking for some function builtin within pandas which could do this.

Finrod Felagund Over a year ago

Such does not exist for now.

user8436761 Over a year ago

I tried this but instead of dates that look like a normal dates in excel file ("Feb-20"), I am getting some float numbers in Python like 32874.0. Any ideas on how to fix this?

Finrod Felagund Over a year ago

Yes, Excel remembers dates as floats. Use pandas build in method ".to_datetime()".

Alexander Chervov Over a year ago

thank you ! About dates conversion - it seems Excel numerates dates by integers from 1900-01-01 minus 2 days. So standard to_datetime seems not work.

|

gmar · Accepted Answer · 2019-05-15 11:08:10Z

7

Pyxlsb indeed is an option to read xlsb file, however, is rather limited.

I suggest using the xlwings package which makes it possible to read and write xlsb files without losing sheet formating, formulas, etc. in the xlsb file. There is extensive documentation available.

import pandas as pd
import xlwings as xw

app = xw.App()
book = xw.Book('file.xlsb')
sheet = book.sheets('sheet_name')
df = sheet.range('A1').options(pd.DataFrame, expand='table').value
book.close()
app.kill()

'A1' in this case is the starting position of the excel table. To write to xlsb file, simply write:

sheet.range('A1').value = df

edited May 15, 2019 at 11:08

answered May 15, 2019 at 9:41

gmar

711 silver badge3 bronze badges

1 Comment

Zev Over a year ago

This adds a major requirement: you have to have a running instance of Excel. This won't work on Linux machines.

Friedrich · Accepted Answer · 2024-04-25 11:02:16Z

1

Accepted answer only retrieved one sheet from the workbook in my trial. As in Finrod Felagund's answer or retrieving a specific sheet, working hierarchically with specific workbook and worksheet is more accurate.

For ease of use, if you would like to convert xlsb to xlsx easily, I found aspose-cells-python package quite easy to utilize to convert xlsb to xlsx. currently a python version 3.11 environment is sufficient. source website

import aspose.cells 
from aspose.cells import Workbook
workbook = Workbook("input.xlsb")
workbook.save("Output.xlsx")

edited Apr 25, 2024 at 11:02

Friedrich

5,45716 gold badges82 silver badges62 bronze badges

answered Apr 22, 2024 at 15:22

FIRE Araştırma Eğitim Ltd. Şti

134 bronze badges

Comments

Rishabh Kaushik · Accepted Answer · 2019-10-10 01:04:54Z

0

If you want to read a big binary file or any excel file with some ranges you can directly put at this code

range = (your_index_number)
first_dataframe = []
second_dataframe = []
with open_xlsb('Test.xlsb') as wb:
    with wb.get_sheet('Sheet1') as sheet:
        i=0
        for row in sheet.rows():
            if(i!=range):
                first_dataframe.append([item.v for item in row])
                i=i+1
            else:
                second_dataframe.append([item.v for item in row])


first_dataframe = pd.DataFrame(first_dataframe[1:], columns=first[0])
second_dataframe = pd.DataFrame(second_dataframe[:], columns=first.columns)

answered Oct 10, 2019 at 1:04

Rishabh Kaushik

235 bronze badges

1 Comment

MGM Over a year ago

What package needs to be imported for "open_xlsb"?

GERMAN RODRIGUEZ · Accepted Answer · 2021-05-17 02:03:13Z

To be able to read xlsb files, it is necessary to have openpyxl installed.

As per https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html#pandas.read_excel

engine: str, default None

If io is not a buffer or path, this must be set to identify io. Supported engines: “xlrd”, “openpyxl”, “odf”, “pyxlsb”. Engine compatibility :

“xlrd” supports old-style Excel files (.xls).

“openpyxl” supports newer Excel file formats.

“odf” supports OpenDocument file formats (.odf, .ods, .odt).

“pyxlsb” supports Binary Excel files.

Changed in version 1.2.0: The engine xlrd now only supports old-style .xls files. When engine=None, the following logic will be used to determine the engine:

If path_or_buffer is an OpenDocument format (.odf, .ods, .odt), then odf will be used.

Otherwise if path_or_buffer is an xls format, xlrd will be used.

Otherwise if openpyxl is installed, then openpyxl will be used.

Otherwise if xlrd >= 2.0 is installed, a ValueError will be raised.

Otherwise xlrd will be used and a FutureWarning will be raised. This case will raise a ValueError in a future version of pandas.

xlsb reading without index_col:

import pandas as pd

dfcluster = pd.read_excel('c:/xml/baseline/distribucion.xlsb', sheet_name='Cluster', index_col=0, engine='pyxlsb')

Collectives™ on Stack Overflow

Read XLSB File in Pandas Python

6 Answers 6

5 Comments

8 Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

5 Comments

8 Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related