18

I have got two questions on reading and writing Python objects from/to Azure blob storage.

  1. Can someone tell me how to write Python dataframe as csv file directly into Azure Blob without storing it locally?

    I tried using the functions create_blob_from_text and create_blob_from_stream but none of them works.

    Converting dataframe to string and using create_blob_from_text function writes the file into the blob but as a plain string but not as csv.

    df_b = df.to_string()
    block_blob_service.create_blob_from_text('test', 'OutFilePy.csv', df_b)  
    
  2. How to directly read a json file in Azure blob storage directly into Python?

1

5 Answers 5

22
  1. Can someone tell me how to write Python dataframe as csv file directly into Azure Blob without storing it locally?

You could use pandas.DataFrame.to_csv method.

Sample code:

from azure.storage.blob import (
    BlockBlobService
)
import pandas as pd
import io

output = io.StringIO()
head = ["col1" , "col2" , "col3"]
l = [[1 , 2 , 3],[4,5,6] , [8 , 7 , 9]]
df = pd.DataFrame (l , columns = head)
print(df)
output = df.to_csv (index_label="idx", encoding = "utf-8")
print(output)

accountName = "***"
accountKey = "***"
containerName = "test1"
blobName = "test3.json"

blobService = BlockBlobService(account_name=accountName, account_key=accountKey)

blobService.create_blob_from_text('test1', 'OutFilePy.csv', output)

Output result:

enter image description here

2.How to directly read a json file in Azure blob storage directly into Python?

Sample code:

from azure.storage.blob import (
    BlockBlobService
)

accountName = "***"
accountKey = "***"
containerName = "test1"
blobName = "test3.json"

blobService = BlockBlobService(account_name=accountName, account_key=accountKey)

result = blobService.get_blob_to_text(containerName,blobName)

print(result.content)

Output result:

enter image description here

Hope it helps you.

Sign up to request clarification or add additional context in comments.

2 Comments

When I store the df.to_csv in a variable, it stores it in a local directory and the variable is of None type. Am I missing something?
If you would like to save the output to a subfolder then make this change: blobService.create_blob_from_text('test1', 'folder1/folder2/OutFilePy.csv', output)
10

The approved answer did not work for me, as it depends on the azure-storage (deprecated/legacy as of 2021) package. I changed it as follows:

from azure.storage.blob import *
import dotenv
import io
import pandas as pd

dotenv.load_dotenv()
blob_block = ContainerClient.from_connection_string(
    conn_str=os.environ["CONNECTION_STRING"],
    container_name=os.environ["CONTAINER_NAME"]
    )
output = io.StringIO()
partial = df.DataFrame()
output = partial.to_csv(encoding='utf-8')
blob_block.upload_blob(name, output, overwrite=True, encoding='utf-8')

1 Comment

3

There was update in BlobServiceClient. create_blob_from_text method is no longer supported. Now you can use get_blob_client to get or create the blob file. Blob need not exist:

output = dataframe.to_csv(index_label="idx", encoding="utf-8")

blob_service = BlobServiceClient.from_connection_string(
   f"DefaultEndpointsProtocol=https;AccountName={ACCOUNT_NAME};AccountKey= 
{ACCOUNT_KEY};EndpointSuffix=core.windows.net"
)

container_client = blob_service.get_container_client(DEST_CONTAINER)
blob_client = blob_service.get_blob_client(container=DEST_CONTAINER, 
blob="kcScenarioTest/"+str(current_time.microsecond)+".csv") 

blob_client.upload_blob(output,overwrite=True,content_settings=ContentSettings(content_type="text/csv"))

Comments

2

Here's an example of writing a Python DataFrame into Azure Blob Storage without storing it locally. It doesn't require String.IO and uses the ContainerClient instead of BlockBlobService.


import pandas as pd

def write_csv(env, df_path, df):
    container_client = ContainerClient(
        env['container_url'],
        container_name=env['container_name'],
        credential=env['container_cred']
    )

    output = df.to_csv (index_label="idx", encoding = "utf-8")
    print(output)
    blob_client = container_client.get_blob_client(df_path)
    blob_client.upload_blob(output, overwrite=True)

    return 'success'

Comments

0

So you need a BytesIO file to upload to the blob, using the upload_blob method from azure.storage.blob module. You will also need to create a cotainer_client from the same module

blob_report_name = 'OutFilePy.csv'
stream_file = BytesIO()
df_b.to_csv(stream_file)  
file_to_blob = stream_file.getvalue()
blob_client = container_client.get_blob_client(blob_report_name)
blob_client.upload_blob(data=file_to_blob, overwrite=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.