13

I want to upload a pandas dataframe to a server as csv file without saving it on the disk. Is there a way to create a more or less "fake csv" file which pretends to be a real file?

Here is some example code:

First I get my data from a sql query and store it as a dataframe. In the upload_ga_data function I want to have something with this logic:

 media = MediaFileUpload('df',
                      mimetype='application/octet-stream',
                      resumable=False)

Full example:

from __future__ import print_function
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.errors import HttpError
from apiclient.http import MediaFileUpload
import pymysql
import pandas as pd
con = x

ga_query = """
    SELECT XXXXX
    """

df = pd.read_sql_query(ga_query,con)

df.to_csv('ga_export.csv', sep=',', encoding='utf-8', index = False)

def upload_ga_data():
    try:
        media = MediaFileUpload('ga_export.csv',
                          mimetype='application/octet-stream',
                          resumable=False)
        daily_upload = service.management().uploads().uploadData(
                accountId=accountId,
                webPropertyId=webPropertyId,
                customDataSourceId=customDataSourceId,
                media_body=media).execute()
        print ("Upload was successfull")
    except TypeError as error:
      # Handle errors in constructing a query.
      print ('There was an error in constructing your query : %s' % error)

3 Answers 3

17

The required behavior is possible using stream:

to create a more or less "fake csv" file which pretends to be a real file

Python makes File Descriptor (with open) and Stream (with io.StringIO) behave similarly. Then anywhere you can use a file descriptor can also use a String Stream.

The easiest way to create a text stream is with open(), optionally specifying an encoding:

f = open("myfile.txt", "r", encoding="utf-8")

In-memory text streams are also available as StringIO objects:

f = io.StringIO("some initial text data")

The text stream API is described in detail in the documentation of TextIOBase.

In Pandas you can do it with any function having path_or_buf argument in its signature, such as to_csv:

DataFrame.to_csv(path_or_buf=None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True, escapechar=None, decimal='.')

Following code exports a dummy DataFrame in CSV format into a String Stream (not physical file, in-memory octet-stream):

import io
import pandas as pd

df = pd.DataFrame(list(range(10)))

stream = io.StringIO()
df.to_csv(stream, sep=";")

When you want to get access to the stream content, just issue:

>>> stream.getvalue()
';0\n0;0\n1;1\n2;2\n3;3\n4;4\n5;5\n6;6\n7;7\n8;8\n9;9\n'

It returns the content without having the need to use a real file.

Sign up to request clarification or add additional context in comments.

5 Comments

I tried to upload with as http.MediaIoBaseUpload(stream.getvalue(), mimetype=mimetype, resumable=True) but it throws a FileNotFoundError: [Errno 2] No such file or directory error instead. Am I missing something or does the stream doesn't really work for the upload?
@AlphaCR just pass the stream. Does not return the value which is interpreted as a file path.
passing just the stream would only result in TypeError: expected str, bytes or os.PathLike object, not _io.StringIO. Perhaps its due to something else?
Well without a minimal reproducible example it is hard to tell. But your error says that this object does not accept StringIO and expect bytes. Maybe you will get success with a io.BytesIO object.
@AlphaCR If you or anyone else coming here in future gets that error message, check if you are accidentally calling MediaFileUpload instead of MediaIOBaseUpload
0

Though the other answer is an excellent start, there may be some who are confused on how to complete op's whole task. Here is a way to go from writing a dataframe to a stream to preparing that stream for upload using Google apiclient.http module. A key difference from op's attempt is that I pass the stream itself to a MediaIOBaseUpload instead of a MediaFileUpload. The file is assumed to be utf-8 like OP's file. This runs fine for me until the media is being uploaded, then I have an error " self._fp.write(s.encode('ascii', 'surrogateescape')) UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 2313: ordinal not in range(128)"

import io
import pandas as pd
from googleapiclient.errors import HttpError

from apiclient.http import MediaIOBaseUpload  # Changed this from MediaFileUpload

df = pd.DataFrame(list(range(10)))

stream = io.StringIO()
# writing df to the stream instead of a file:
df.to_csv(stream, sep=',', encoding='utf-8', index = False)
try:
    media = MediaIOBaseUpload(stream,
                          mimetype='application/octet-stream',
                          resumable=False)

#### Your upload logic here using media just created ####

except HttpError as error:

    #### Handle your errors in uploading here ####

Because I have a unicode character, I developed the alternative code which accomplishes the same thing but can handle the unicode characters.

import io
import pandas as pd
from googleapiclient.errors import HttpError

from apiclient.http import MediaIOBaseUpload  # Changed this from MediaFileUpload

df = pd.DataFrame(list(range(10)))

records = df.to_csv(line_terminator='\r\n', index=False).encode('utf-8')
bytes = io.BytesIO(records)

try:
    media = MediaIOBaseUpload(bytes,
                          mimetype='application/octet-stream',
                          resumable=False)

#### Your upload logic here using media just created ####

except HttpError as error:

    #### Handle your errors in uploading here ####

Comments

0

I used:

from googleapiclient.http import MediaIoBaseUpload

versus @Katherine's:

from apiclient.http import MediaIOBaseUpload 

But other than that, @Katherine's alternative solution worked perfectly for me as I was developing a solution to write a dataframe to a csv file in Google Drive running from a Google Cloud Function.

1 Comment

This does not provide an answer to the question. Once you have sufficient reputation you will be able to comment on any post; instead, provide answers that don't require clarification from the asker. - From Review

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.