3

I have a Python generator that will yield a large and unknown amount of byte data. I'd like to stream the output to GCS, without buffering to a file on disk first.

While I'm sure this is possible (e.g., I can create a subprocess of gsutil cp - <...> and just write my bytes into its stdin), I'm not sure what's a recommended/supported way and the documentation gives the example of uploading a local file.

How should I do this right?

1
  • 1
    The magic is to convert your generator into a stream that yields each time a read is performed. The Python example in your reference link demonstrates how to read the stream. This article will help you create a stream backed by a generator: coderscat.com/python-generator-and-yield Commented Sep 6, 2022 at 0:18

1 Answer 1

6

The BlobWriter class makes this a bit easier:

bucket = storage_client.bucket('my_bucket')
blob = bucket.blob('my_object')
writer = BlobWriter(blob)

for d in your_generator:
  writer.write(d)

writer.close()
Sign up to request clarification or add additional context in comments.

1 Comment

So, is each stream write counted as a GCS write operation or is the entire stream counted as a single write operation when we flush or close the stream? There could be millions of writes to the stream which of course will exceed any quotas and that would break the bank, literally.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.