0

I'm trying to develop a script in python to read a file in .xlsx from a blob storage container called "source", convert it in .csv and store it in a new container (I'm testing the script locally, if working I should include it in an ADF pipeline). So far, I managed to access to the blob storage, but I'm having problems in reading the file content.

from azure.storage.blob import BlobServiceClient, ContainerClient, BlobClient
import pandas as pd

conn_str = "DefaultEndpointsProtocol=https;AccountName=XXXXXX;AccountKey=XXXXXX;EndpointSuffix=core.windows.net"
container = "source"
blob_name = "prova.xlsx"

container_client = ContainerClient.from_connection_string(
    conn_str=conn_str, 
    container_name=container
    )
# Download blob as StorageStreamDownloader object (stored in memory)
downloaded_blob = container_client.download_blob(blob_name)

df = pd.read_excel(downloaded_blob)

print(df)

I get following error:

ValueError: Invalid file path or buffer object type: <class 'azure.storage.blob._download.StorageStreamDownloader'>

I tried with a .csv file as input and writing the parsing code as follows:

df = pd.read_csv(StringIO(downloaded_blob.content_as_text()) )

and it works.

Any suggestion on how to modify the code so that the excel file becomes readable?

2
  • 2
    Please try to use df = pd.read_excel(downloaded_blob.content_as_bytes()) Commented May 13, 2020 at 7:27
  • Hi @JimXu I've just opened the ticket to write that I found the solution on StorageStreamDownloader class page (learn.microsoft.com/en-us/python/api/azure-storage-blob/…) and I saw your answer: I can confirm that with .content_as_bytes() it works. Thanks anyway! Commented May 13, 2020 at 11:22

2 Answers 2

2

I summary the solution as below.

When we use the method pd.read_excel() in sdk pandas, we need to provide bytes as input. But when we use download_blob to download the excel file from azure blob, we just get azure.storage.blob.StorageStreamDownloader. So we need to use the method readall() or content_as_bytes() to convert it to bytes. For more details, please refer to the document and the document

Sign up to request clarification or add additional context in comments.

Comments

0

Change

df = pd.read_excel(downloaded_blob)

to

df = pd.read_excel(downloaded_blob.content_as_bytes())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.