2

I know how to download the file from cloud storage within the cloud run instance. But, I can't find the syntax for reading the file in python. I'm looking to immediately convert the csv file into a pandas dataframe, just by using pd.read_csv('testing.csv'). So my personal code looks like, download_blob(bucket_name, source_blob_name, 'testing.csv'). So shouldn't I be able to do pd.read_csv('testing.csv') within the cloud run instance? When doing it this way, I keep getting an internal server when loading the page. It seems like a simple question, but I haven't been able to find an example of it anywhere. Everything just downloads the file, I never see it used.



def download_blob(bucket_name, source_blob_name, destination_file_name):
    """Downloads a blob from the bucket."""
    # The ID of your GCS bucket
    # bucket_name = "your-bucket-name"

    # The ID of your GCS object
    # source_blob_name = "storage-object-name"

    # The path to which the file should be downloaded
    # destination_file_name = "local/path/to/file"

    storage_client = storage.Client()

    bucket = storage_client.bucket(bucket_name)

    # Construct a client side representation of a blob.
    # Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
    # any content from Google Cloud Storage. As we don't need additional data,
    # using `Bucket.blob` is preferred here.
    blob = bucket.blob(source_blob_name)
    blob.download_to_filename(destination_file_name)

    print(
        "Downloaded storage object {} from bucket {} to local file {}.".format(
            source_blob_name, bucket_name, destination_file_name
        )
    )

2 Answers 2

3

Using a filename such as 'testing.csv' means write the file to the current directory. What is the current directory? Instead, specify an absolute path to a known directory location.

Download to the /tmp/ directory, e.g. '/tmp/testing.csv'. Using file system space consumes memory as the file system is RAM-based. Make sure the Cloud Run instance has enough memory.

Excerpt from Cloud Run Container Runtime Contact:

The filesystem of your container is writable and is subject to the following behavior:

  • This is an in-memory filesystem, so writing to it uses the container instance's memory.
  • Data written to the filesystem does not persist when the container instance is stopped.

Reference: Filesystem access

Sign up to request clarification or add additional context in comments.

Comments

1

download_as_bytes is the function you're looking for if you want to load it directly to memory.

storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
data = blob.download_as_bytes()
pd.read_csv(StringIO(data))

https://googleapis.dev/python/storage/latest/blobs.html#google.cloud.storage.blob.Blob.download_as_bytes

Pandas also supports reading directly from Google Cloud Storage. https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file.

So something like "gs://bucket/file"

1 Comment

Wanna give you an upvote because I did end up using @John_Hanley 's answer for another file and in this case it answers my questions head on. However, I also used pandas directly reading from the GCS. Thanks for the help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.