generate multiple pandas data frames

Question

I am retrieving multiple data frames in csv format from a website. I save the data frames in a an empty list and then read one by one. I can not append them into a single data frame since they have different column names and column orders. So I have the following questions:

Can I create a data frame with a different name inside the loop I use to read the files, so instead of saving them to a list I create a new dataframe for every file retrieved? If this is not possible/recommendable is there a way to iterate my list to extract the data frames? Currently I read one dataframe at the time but I would love to come up with a way to automate this code to create something like data_1, data_2, etc. Right now my code is not terribly time consuming since I only have 4 data frames, but this can become burdensome with more data. Here is my code:

import pandas as pd
import urllib2
import csv

#we write the names of the files in a list so we can iterate to download the files
periods=['2012-1st-quarter','2012-2nd-quarter', '2012-3rd-quarter', '2012-4th-quarter']
general=[]
#we generate a loop to read the files from the capital bikeshare website
for i in periods:
    url = 'https://www.capitalbikeshare.com/assets/files/trip-history-data/'+i+'.csv'
    response = urllib2.urlopen(url)
    x=pd.read_csv(response)
    general.append(x)
q1=pd.DataFrame(general[0])

Thanks!

that's nothing wrong technically in your code, although you may benefit from creating a function that accepts an argument like periods index or name and only return the Dataframe if it's called. — Anzel
– Anzel, Commented Jan 26, 2015 at 1:48

elyase · Accepted Answer · 2015-01-26 02:04:17Z

3

It would be better if you use a dict, also you can directly pass a url to pandas.read_csv. So the simplified code would look like this:

import pandas as pd

periods = ['2012-1st-quarter','2012-2nd-quarter', '2012-3rd-quarter', '2012-4th-quarter']
url = 'https://www.capitalbikeshare.com/assets/files/trip-history-data/{}.csv'
d = {period: pd.read_csv(url.format(period)) for period in periods}

Then you can access a specific DataFrame like this:

 d['2012-4th-quarter']

To iterate through all Dataframes:

for period, df in d.items():
    print period
    print df

edited Jan 26, 2015 at 2:04

answered Jan 26, 2015 at 1:57

elyase

41.2k12 gold badges121 silver badges123 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

asado23 Over a year ago

This is pretty elegant and works perfectly. Thanks. Just one more question, now that I have the dataframes in a dict, is there any way to extract them and to rename them all at once producing something like df_1, df_2, df_3, etc?

elyase Over a year ago

You want to rename the keys of the dictionary? What do you mean with renaming a DataFrame?

asado23 Over a year ago

No, I need to manipulate the data frames, but to do that I extract them from the dictionaries, so I was wondering if there is a way to extract all at once.

elyase Over a year ago

Sorry I still don't get it, what do mean with extract all at once?

asado23 Over a year ago

For example to pass to a dataframe i do the following: df_1=pd.DataFrame(d['2012-4th-quarter']) which gives me the desired dataframe. I was wondering if there is a way to extract the 4 dataframes at once (and to create df_2, df_3 etc.), so i don't have to repeat the process as many times as elements in the dict.

|

Collectives™ on Stack Overflow

generate multiple pandas data frames

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related