2

I'm trying to export a pandas dataframe to a csv file in a bucket on my google cloud storage, but the following code obviously isn't working for me:

my_df.to_csv(StringIO(file_io.FileIO('gs://mybucket/data/file.csv', mode='w+')))

How should this be rewritten? I'm getting the following error:

unbound method write() must be called within FileIO instance as first argument (got nothing instead)

Apologies if the answer is obvious, but I'm just starting to learn python.

3
  • 2
    Possible duplicate of Save pandas data frame as csv on to gcloud storage bucket Commented Mar 26, 2019 at 21:27
  • Is your CSV of a memory holdable size? If yes, you can apparently write a new object to GCS from a string from python. If your data is too large, you can write it to a local file and then upload the file from API. Don't confuse GCS for a file system. Commented Mar 26, 2019 at 22:32
  • I was specifically attempting to determine how to use StringIO and FileIO to export a file to gcloud storage bucket. None of the other solutions offered here, which I perused, offered a solution. I was successful in using these to import a csv on gcloud to a dataframe, so I assumed it wouldn't be too complicated to do the same in the other direction. I did manage to get gcs to work, so I'll post how I did so below for anyone else who might be wondering. Commented Mar 28, 2019 at 19:40

2 Answers 2

2

Importing a file from gcloud to dataframe works when I code thus:

from tensorflow.python.lib.io import file_io
from pandas.compat import StringIO
import pandas as pd

def read_data(gcs_path):
   file_stream = file_io.FileIO(gcs_path, mode='r')
   data = pd.read_csv(StringIO(file_stream.read()), names=['various', 'column', 'names'])
   return data

my_df = read_data('gs://mybucket/data/file.csv')

But I haven't been able to reverse the process.

GCS has worked for me, however:

import google.cloud.storage as gcs

client = gcs.Client()
bucket = client.bucket('my-bucket')
blobs = list(bucket.list_blobs(prefix='data/'))

my_df.to_csv('tmp.csv')
local_tmp_path = ('tmp.csv')
target_blob = bucket.blob('data/file.csv')
target_blob.upload_from_file(open(local_tmp_path, 'r'))
Sign up to request clarification or add additional context in comments.

1 Comment

For what are you using blobs?
0

You can save your csv file in your VM and then use gsutil to save it on your bucket.

Python:

my_df.to_csv("data.csv")

Shell:

gsutil data.csv gs://my_bucket/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.