Reading excel files from "input" blob storage container and exporting to csv in "output" container with python

Question

I'm trying to develop a script in python to read a file in .xlsx from a blob storage container called "source", convert it in .csv and store it in a new container (I'm testing the script locally, if working I should include it in an ADF pipeline). So far, I managed to access to the blob storage, but I'm having problems in reading the file content.

from azure.storage.blob import BlobServiceClient, ContainerClient, BlobClient
import pandas as pd

conn_str = "DefaultEndpointsProtocol=https;AccountName=XXXXXX;AccountKey=XXXXXX;EndpointSuffix=core.windows.net"
container = "source"
blob_name = "prova.xlsx"

container_client = ContainerClient.from_connection_string(
    conn_str=conn_str, 
    container_name=container
    )
# Download blob as StorageStreamDownloader object (stored in memory)
downloaded_blob = container_client.download_blob(blob_name)

df = pd.read_excel(downloaded_blob)

print(df)

I get following error:

ValueError: Invalid file path or buffer object type: <class 'azure.storage.blob._download.StorageStreamDownloader'>

I tried with a .csv file as input and writing the parsing code as follows:

df = pd.read_csv(StringIO(downloaded_blob.content_as_text()) )

and it works.

Any suggestion on how to modify the code so that the excel file becomes readable?

Please try to use df = pd.read_excel(downloaded_blob.content_as_bytes()) — Jim Xu
– Jim Xu, Commented May 13, 2020 at 7:27
Hi @JimXu I've just opened the ticket to write that I found the solution on StorageStreamDownloader class page (learn.microsoft.com/en-us/python/api/azure-storage-blob/…) and I saw your answer: I can confirm that with .content_as_bytes() it works. Thanks anyway! — Greenfox
– Greenfox, Commented May 13, 2020 at 11:22

Jim Xu · Accepted Answer · 2020-05-17 01:18:37Z

2

I summary the solution as below.

When we use the method pd.read_excel() in sdk pandas, we need to provide bytes as input. But when we use download_blob to download the excel file from azure blob, we just get azure.storage.blob.StorageStreamDownloader. So we need to use the method readall() or content_as_bytes() to convert it to bytes. For more details, please refer to the document and the document

answered May 17, 2020 at 1:18

Jim Xu

23.2k2 gold badges24 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Syscall · Accepted Answer · 2022-01-28 12:38:00Z

0

Change

df = pd.read_excel(downloaded_blob)

to

df = pd.read_excel(downloaded_blob.content_as_bytes())

edited Jan 28, 2022 at 12:38

Syscall

19.8k10 gold badges44 silver badges60 bronze badges

answered Jan 28, 2022 at 12:16

Sunny Kumar

1

Collectives™ on Stack Overflow

Reading excel files from "input" blob storage container and exporting to csv in "output" container with python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related