Getting Google Spreadsheet CSV into A Pandas Dataframe

Question

I uploaded a file to Google spreadsheets (to make a publically accessible example IPython Notebook, with data) I was using the file in it's native form could be read into a Pandas Dataframe. So now I use the following code to read the spreadsheet, works fine but just comes in as string,, and I'm not having any luck trying to get it back into a dataframe (you can get the data)

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content

The data ends up looking like: (1st row headers)

',City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp\n0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,1/15/2010,2,2,3,2,3,3\n10,Foley,South_Mobile-Baldwin,Residential,Suburban_Urban,1/15/2010,4,4,4,4,4,3\n12,Birmingham,North_Central-Birmingham-Tuscaloosa-Anniston,Commercial,Suburban_Urban,1/15/2010,2,2,3,2,2,3\n

The native pandas code that brings in the disk resident file looks like:

df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])

A "clean" solution would be helpful to many to provide an easy way to share datasets for Pandas use! I tried a bunch of alternative with no success and I'm pretty sure I'm missing something obvious again.

Just a Update note The new Google spreadsheet has a different URL pattern Just use this in place of the URL in the above example and or the below answer and you should be fine here is an example:

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id

see solution below from @Max Ghenis which just used pd.read_csv, no need for StringIO or requests...

the URL ends with /edit?ts=5c0e311e#gid=0 and the sharing link ends with /edit?usp=sharing, none have csv and both give 404 when requested by pandas code — Mugen
– Mugen, Commented Dec 11, 2018 at 6:57

getup8 · Accepted Answer · 2016-09-07 06:41:39Z

90

Seems to work for me without the StringIO:

test = pd.read_csv('https://docs.google.com/spreadsheets/d/' + 
                   '0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc' +
                   '/export?gid=0&format=csv',
                   # Set first column as rownames in data frame
                   index_col=0,
                   # Parse column values to datetime
                   parse_dates=['Quradate']
                  )
test.head(5)  # Same result as @TomAugspurger

BTW, including the ?gid= enables importing different sheets, find the gid in the URL.

edited Sep 7, 2016 at 6:41

getup8

8,4082 gold badges30 silver badges33 bronze badges

answered Feb 6, 2016 at 20:23

Max Ghenis

16k17 gold badges94 silver badges142 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

getup8 Over a year ago

Maybe just add comments as to what index_col and parse_dates do? Also, maybe this is obvious, but I think this only works if the Spreadsheet is public; I believe if it's not, you'll have to use the API.

Dylan Hogg Over a year ago

Great solution. Works when a sheet is shared as "Anyone on the Internet with this link can view". Note that index_col and parse_dates arguments are optional.

Marco Cerliani Over a year ago

it only works when the SPREADSHEET IS PUBLIC

zabop · Accepted Answer · 2021-01-04 15:35:21Z

65

You can use read_csv() on a StringIO object:

from io import BytesIO

import requests
import pandas as pd

r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
    
In [10]: df = pd.read_csv(BytesIO(data), index_col=0,parse_dates=['Quradate'])

In [11]: df.head()
Out[11]: 
          City                                            region     Res_Comm  \
0       Dothan  South_Central-Montgomery-Auburn-Wiregrass-Dothan  Residential   
10       Foley                              South_Mobile-Baldwin  Residential   
12  Birmingham      North_Central-Birmingham-Tuscaloosa-Anniston   Commercial   
38       Brent      North_Central-Birmingham-Tuscaloosa-Anniston  Residential   
44      Athens                 North_Huntsville-Decatur-Florence  Residential   

          mkt_type            Quradate  National_exp  Alabama_exp  Sales_exp  \
0            Rural 2010-01-15 00:00:00             2            2          3   
10  Suburban_Urban 2010-01-15 00:00:00             4            4          4   
12  Suburban_Urban 2010-01-15 00:00:00             2            2          3   
38           Rural 2010-01-15 00:00:00             3            3          3   
44  Suburban_Urban 2010-01-15 00:00:00             4            5          4   

    Inventory_exp  Price_exp  Credit_exp  
0               2          3           3  
10              4          4           3  
12              2          2           3  
38              3          3           2  
44              4          4           4

edited Jan 4, 2021 at 15:35

zabop

8,1124 gold badges56 silver badges112 bronze badges

answered Oct 26, 2013 at 21:02

TomAugspurger

29k8 gold badges89 silver badges71 bronze badges

7 Comments

moldovean Over a year ago

I was looking for weeks how to import a spreadsheet into pandas. never heard of requests or StringIO libraries. Thank you!!

dartdog Over a year ago

Note the new URL format in the bottom of the original question above it is needed for the new Google spreadsheet version

ezcodr Over a year ago

To clarify "got moved around in python3 if you're using that": from io import StringIO

nealmcb Over a year ago

Thanks! But I had to use this form of google url for csv output: stackoverflow.com/a/23702001/507544

Max Ghenis Over a year ago

How can one specify the sheet (i.e. #gid=x in URL)? Adding it to the URL itself after key= didn't work.

|

Ken Arnold · Accepted Answer · 2018-05-16 17:48:50Z

25

Open the specific sheet you want in your browser. Make sure it's at least viewable by anyone with the link. Copy and paste the URL. You'll get something like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER.

sheet_url = 'https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER'

First we turn that into a CSV export URL, like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER:

csv_export_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')

Then we pass it to pd.read_csv, which can take a URL.

df = pd.read_csv(csv_export_url)

This will break if Google changes its API (it seems undocumented), and may give unhelpful errors if a network failure occurs.

answered May 16, 2018 at 17:48

Ken Arnold

2,0131 gold badge23 silver badges25 bronze badges

3 Comments

diegodsp Over a year ago

This code returns a HTML page for download the csv, not the csv file from gsheet.

rsc05 Over a year ago

I am getting ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 2

Raisin Over a year ago

Did you make sure access is set to "anyone with the link"

Abhery Guha · Accepted Answer · 2018-01-02 14:37:53Z

12

My approach is a bit different. I just used pandas.Dataframe() but obviously needed to install and import gspread. And it worked fine!

gsheet = gs.open("Name")
Sheet_name ="today"
wsheet = gsheet.worksheet(Sheet_name)
dataframe = pd.DataFrame(wsheet.get_all_records())

answered Jan 2, 2018 at 14:37

Abhery Guha

1371 silver badge5 bronze badges

2 Comments

dartdog Over a year ago

Nice..The interface keeps getting cleaner!

RAbraham Over a year ago

just to clarify, gs would be gs = gspread.authorize(credentials)

Gianmario Spacagna · Accepted Answer · 2018-02-26 10:46:32Z

I have been using the following utils and it worked so far:

def load_from_gspreadsheet(sheet_name, key):
    url = 'https://docs.google.com/spreadsheets/d/{key}/gviz/tq?tqx=out:csv&sheet={sheet_name}&headers=1'.format(
        key=key, sheet_name=sheet_name.replace(' ', '%20'))

    log.info('Loading google spreadsheet from {}'.format(url))

    df = pd.read_csv(url)
    return df.drop([col for col in df.columns if col.startswith('Unnamed')], axis=1)

You must specify the sheet_name and the key. The key is the string you get from the url in the following path: https://docs.google.com/spreadsheets/d/{key}/edit/.

You can change the value of headers if you have more than one row for the column names but I am not sure if it still work with multi-headers.

It may brake if Google will change their APIs.

Also please bear in mind that your spreadsheet must be public, everyone with the link can read it.

Parth chokhra · Accepted Answer · 2021-12-02 03:08:27Z

6

First

Click on File
Select Publish to the web tab
Select which sheet you want as CSV(in case of multiple sheet) and also change the format from webpage to Comma-separated values
Click on publish
Copy the link eg: https://docs.google.com/spreadsheets/d/e/{}/pub?gid=0&single=true&output=csv

import pandas as pd
pd.read_csv("https://docs.google.com/spreadsheets/d/e/{}/pub?gid=0&single=true&output=csv")

answered Dec 2, 2021 at 3:08

Parth chokhra

1111 silver badge6 bronze badges

2 Comments

Shaida Muhammad Over a year ago

underrated but simple answer.

Sergey Belousov Over a year ago

This worked for me, thanks! But this reads only the first sheet. How could I read all the sheets?

JQTs · Accepted Answer · 2022-12-23 17:30:39Z

5

Straight to the point:

Get your google URL

https://docs.google.com/spreadsheets/d/ this is your sheet ID number/edit?gid=This will be your tab name, it will be a number. Each tab has its own

I like to make a function(not making here) so I separate my variables

sheet_id = "Place your sheet ID here"
sheet_name = "Place your sheet # here"

the next URL is the tricky part:

url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/export?gid={sheet_name}&format=csv"

Then just read it in

df = pd.csv(url)

That's it. If you need to select a different row as a header you can do this

df = pd.csv(url, header=1)

edited Dec 23, 2022 at 17:30

answered Apr 13, 2022 at 0:19

JQTs

3284 silver badges11 bronze badges

2 Comments

Robb Dunlap Over a year ago

use "df = pd.read_csv(url)" instead of "df = pd.csv(url)".

JQTs Over a year ago

Great point @RobbDunlap I can not remember why I used pd.csv and not put "read" in there.

ivansaul · Accepted Answer · 2021-02-11 08:43:17Z

4

This works for me.

import pandas as pd

#Create a public URL
#https://docs.google.com/spreadsheets/d/0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc/edit?usp=sharing

#get spreadsheets key from url
gsheetkey = "0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc"

#sheet name
sheet_name = 'Sheet 1'

url=f'https://docs.google.com/spreadsheet/ccc?key={gsheetkey}&output=xlsx'
df = pd.read_excel(url,sheet_name=sheet_name)
print(df)

answered Feb 11, 2021 at 8:43

ivansaul

3,4111 gold badge10 silver badges10 bronze badges

Comments

kaza · Accepted Answer · 2018-05-08 01:34:00Z

If the csv file was shared via drive and not via spreadsheet then the below change to the url would work

#Derive the id from the google drive shareable link.
#For the file at hand the link is as below
#<https://drive.google.com/open?id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69>
file_id='1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
link='https://drive.google.com/uc?export=download&id={FILE_ID}'
csv_url=link.format(FILE_ID=file_id)
#The final url would be as below:-
#csv_url='https://drive.google.com/uc?export=download&id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
df = pd.read_csv(csv_url)

And the dataframe would be (if you just ran the above code)

    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9   10  11
3   12  13  14  15

See working code here.

Oleg · Accepted Answer · 2021-08-13 09:28:59Z

3

In Google Sheets file go to File > Publish to the web > Select .csv (see screenshot) > Copy link

Google Sheets: Publish to web

Code

import pandas as pd

path = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSvmELTzIjfSmX8GuV3HE2qomN3uRyvPX8RDzpw77JH33DUbj1bjech7H6NYPArvpZFux0DdJ5L5TKy/pub?output=csv'
data = pd.read_csv(path)
print(data)

Code in Google Colab

answered Aug 13, 2021 at 9:28

Oleg

391 bronze badge

Collectives™ on Stack Overflow

Getting Google Spreadsheet CSV into A Pandas Dataframe

10 Answers 10

3 Comments

7 Comments

3 Comments

2 Comments

Comments

2 Comments

the next URL is the tricky part:

Then just read it in

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

3 Comments

7 Comments

3 Comments

2 Comments

Comments

2 Comments

the next URL is the tricky part:

Then just read it in

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related