21

Can someone tell me if it is possible to read a csv file directly from Azure blob storage as a stream and process it using Python? I know it can be done using C#.Net (shown below) but wanted to know the equivalent library in Python to do this.

CloudBlobClient client = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = client.GetContainerReference("outfiles");
CloudBlob blob = container.GetBlobReference("Test.csv");*
1
  • @Jay..Do you have any inputs on this? Commented Feb 26, 2018 at 2:35

10 Answers 10

17

Yes, it is certainly possible to do so. Check out Azure Storage SDK for Python

from azure.storage.blob import BlockBlobService

block_blob_service = BlockBlobService(account_name='myaccount', account_key='mykey')

block_blob_service.get_blob_to_path('mycontainer', 'myblockblob', 'out-sunset.png')

You can read the complete SDK documentation here: http://azure-storage.readthedocs.io.

Sign up to request clarification or add additional context in comments.

8 Comments

thanks Gaurav. I checked the page but not able to see GetBlobReference class equivalent for Python.
As such you don't get reference to BlockBlob as you can get in .Net SDK. I have edited my code to show how you can download a blob to local file system and added a link to SDK documentation. HTH.
I know this functionality exist for Python SDK but i am looking for a function similar to .Net
So if I understand correctly, you wish to create an instance of BlockBlob (like CloudBlockBlob) in Python. Correct? Would you mind explaining the reason behind it.
It's in alignment with some of our existing works... I need to read a file from blob as a stream, do some processing and write it back to the blob. The whole Python app will run as a webjob. I know i can download the file from blob to Webjob console (D:) but wanted to know if there is a similar functionality of .Net in Python without having to download the file in drive.
|
13

Here's a way to do it with the new version of the SDK (12.0.0):

from azure.storage.blob import BlobClient

blob = BlobClient(account_url="https://<account_name>.blob.core.windows.net"
                  container_name="<container_name>",
                  blob_name="<blob_name>",
                  credential="<account_key>")

with open("example.csv", "wb") as f:
    data = blob.download_blob()
    data.readinto(f)

See here for details.

5 Comments

HI, this still downloads the file. Is it possible to get the contents of blob without downloading the file?
When you do data = blob.download_blob(), the contents of the blob will be in data, you don't need to write to a file.
@SebastianDziadzio Is there a way to read this data into python data frame? I am somehow unable to work using blockblovservice
If you're downloading a CSV file, you should be able to convert the contents of data to a data frame with pd.read_csv(data).
data.readall() does put the contents into string from a blob
5

One can stream from blob with python like this:

from tempfile import NamedTemporaryFile
from azure.storage.blob.blockblobservice import BlockBlobService

entry_path = conf['entry_path']
container_name = conf['container_name']
blob_service = BlockBlobService(
            account_name=conf['account_name'],
            account_key=conf['account_key'])

def get_file(filename):
    local_file = NamedTemporaryFile()
    blob_service.get_blob_to_stream(container_name, filename, stream=local_file, 
    max_connections=2)

    local_file.seek(0)
    return local_file

2 Comments

Thanks for this, very useful. Does the TemporaryFile need clean-up afterwards?
happy to help:) according to docs (docs.python.org/3/library/tempfile.html) it will be closed and destroyed, no need to worry about that
5

Provide Your Azure subscription Azure storage name and Secret Key as Account Key here

block_blob_service = BlockBlobService(account_name='$$$$$$', account_key='$$$$$$')

This still get the blob and save in current location as 'output.jpg'

block_blob_service.get_blob_to_path('you-container_name', 'your-blob', 'output.jpg')

This will get text/item from blob

blob_item= block_blob_service.get_blob_to_bytes('your-container-name','blob-name')

    blob_item.content

Comments

5

I recommend using smart_open.

import os

from azure.storage.blob import BlobServiceClient
from smart_open import open

connect_str = os.environ['AZURE_STORAGE_CONNECTION_STRING']
transport_params = {
    'client': BlobServiceClient.from_connection_string(connect_str),
}

# stream from Azure Blob Storage
with open('azure://my_container/my_file.txt', transport_params=transport_params) as fin:
    for line in fin:
        print(line)

# stream content *into* Azure Blob Storage (write mode):
with open('azure://my_container/my_file.txt', 'wb', transport_params=transport_params) as fout:
    fout.write(b'hello world')

2 Comments

Where do you put the connection string in this case?
@tammuz I just edited my answer to provide the connection string and link to an example in smart_open's documentation
2

Here is the simple way to read a CSV using Pandas from a Blob:

import os
from azure.storage.blob import BlobServiceClient

service_client = BlobServiceClient.from_connection_string(os.environ['AZURE_STORAGE_CONNECTION_STRING'])
client = service_client.get_container_client("your_container")
bc = client.get_blob_client(blob="your_folder/yourfile.csv")
data = bc.download_blob()
with open("file.csv", "wb") as f:
   data.readinto(f)
df = pd.read_csv("file.csv")

1 Comment

How can I read all csv files in a folder and append them to my dataframe?
2

Since I wasn't able to find what I needed on this thread, I wanted to follow up on @SebastianDziadzio's answer to retrieve the data without downloading it as a local file, which is what I was trying to find for myself.

Replace the with statement with the following:

from io import BytesIO
import pandas as pd

with BytesIO() as input_blob:
    blob_client_instance.download_blob().download_to_stream(input_blob)
    input_blob.seek(0)
    df = pd.read_csv(input_blob, compression='infer', index_col=0)

Comments

1

To Read from Azure Blob I want to use csv from azure blob storage to openpyxl xlsx

from io import BytesIO
conn_str = os.environ.get('BLOB_CONN_STR')
container_name = os.environ.get('CONTAINER_NAME')
blob = BlobClient.from_connection_string(conn_str, container_name=container_name,
                                         blob_name="YOUR BLOB PATH HERE FROM AZURE BLOB")
data = blob.download_blob()
 workbook_obj = openpyxl.load_workbook(filename=BytesIO(data.readall()))

To write in Azure Blob

I struggled lot for this I don't want anyone to do same, If you are using openpyxl and want to directly write from azure function to blob storage do following steps and you will achieve what you are seeking for.

Thanks. HMU if you need anyhelp.

blob=BlobClient.from_connection_string(conn_str=conString,container_name=container_name, blob_name=r'YOUR_PATH/test1.xlsx')
blob.upload_blob(save_virtual_workbook(wb))

1 Comment

what is supossed to be on wb and why you do not use with open statement to read azure path?
0

I know this is an old post but if someone wants to do the same. I was able to access as per below codes

Note: you need to set the AZURE_STORAGE_CONNECTION_STRING which can be obtained from Azure Portal -> Go to your storage -> Settings -> Access keys and then you will get the connection string there.

For Windows: setx AZURE_STORAGE_CONNECTION_STRING ""

For Linux: export AZURE_STORAGE_CONNECTION_STRING=""

For macOS: export AZURE_STORAGE_CONNECTION_STRING=""

import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__

connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
print(connect_str)
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client("Your Storage Name Here")
try:

    print("\nListing blobs...")

    # List the blobs in the container
    blob_list = container_client.list_blobs()
    for blob in blob_list:
        print("\t" + blob.name)

except Exception as ex:
    print('Exception:')
    print(ex)

Comments

0

Azure already have an API to process the blob in memory as a bytes object.

    container: ContainerClient = ContainerClient.from_connection_string(os.getenv("BLOB_CONNECTION_STRING"), bucket)
    stream: StorageStreamDownloader = container.download_blob(blob=key)
    bytes_content = stream.readall()
    string_content = bytes_content.decode()
    file = StringIO(string_content)
    csv_data = csv.reader(file, delimiter=",")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.