3

I am retrieving multiple data frames in csv format from a website. I save the data frames in a an empty list and then read one by one. I can not append them into a single data frame since they have different column names and column orders. So I have the following questions:

Can I create a data frame with a different name inside the loop I use to read the files, so instead of saving them to a list I create a new dataframe for every file retrieved? If this is not possible/recommendable is there a way to iterate my list to extract the data frames? Currently I read one dataframe at the time but I would love to come up with a way to automate this code to create something like data_1, data_2, etc. Right now my code is not terribly time consuming since I only have 4 data frames, but this can become burdensome with more data. Here is my code:

import pandas as pd
import urllib2
import csv

#we write the names of the files in a list so we can iterate to download the files
periods=['2012-1st-quarter','2012-2nd-quarter', '2012-3rd-quarter', '2012-4th-quarter']
general=[]
#we generate a loop to read the files from the capital bikeshare website
for i in periods:
    url = 'https://www.capitalbikeshare.com/assets/files/trip-history-data/'+i+'.csv'
    response = urllib2.urlopen(url)
    x=pd.read_csv(response)
    general.append(x)
q1=pd.DataFrame(general[0])

Thanks!

1
  • that's nothing wrong technically in your code, although you may benefit from creating a function that accepts an argument like periods index or name and only return the Dataframe if it's called. Commented Jan 26, 2015 at 1:48

1 Answer 1

3

It would be better if you use a dict, also you can directly pass a url to pandas.read_csv. So the simplified code would look like this:

import pandas as pd

periods = ['2012-1st-quarter','2012-2nd-quarter', '2012-3rd-quarter', '2012-4th-quarter']
url = 'https://www.capitalbikeshare.com/assets/files/trip-history-data/{}.csv'
d = {period: pd.read_csv(url.format(period)) for period in periods}

Then you can access a specific DataFrame like this:

 d['2012-4th-quarter']

To iterate through all Dataframes:

for period, df in d.items():
    print period
    print df
Sign up to request clarification or add additional context in comments.

7 Comments

This is pretty elegant and works perfectly. Thanks. Just one more question, now that I have the dataframes in a dict, is there any way to extract them and to rename them all at once producing something like df_1, df_2, df_3, etc?
You want to rename the keys of the dictionary? What do you mean with renaming a DataFrame?
No, I need to manipulate the data frames, but to do that I extract them from the dictionaries, so I was wondering if there is a way to extract all at once.
Sorry I still don't get it, what do mean with extract all at once?
For example to pass to a dataframe i do the following: df_1=pd.DataFrame(d['2012-4th-quarter']) which gives me the desired dataframe. I was wondering if there is a way to extract the 4 dataframes at once (and to create df_2, df_3 etc.), so i don't have to repeat the process as many times as elements in the dict.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.