21

I want to write a piece of code to create multiple arrays of DataFrames with their names in the format of <name>_mmyy, where the mm is a month and yy is a year.

Here is an example:

df_0115, df_0215, df_0315, ... , df_1215
stat_0115, stat_0215, stat_0315, ... , stat_1215
1
  • better use dictionary df['0115'], df['0215'], stat['0115'], stat['0215'], etc, Commented Nov 25, 2015 at 3:14

2 Answers 2

35

I suggest that you create a dictionary to hold the DataFrames. That way you will be able to index them with a month-day key:

import datetime as dt 
import numpy as np
import pandas as pd

dates_list = [dt.datetime(2015,11,i+1) for i in range(3)]
month_day_list = [d.strftime("%m%d") for d in dates_list]

dataframe_collection = {} 

for month_day in month_day_list:
    new_data = np.random.rand(3,3)
    dataframe_collection[month_day] = pd.DataFrame(new_data, columns=["one", "two", "three"])

for key in dataframe_collection.keys():
    print("\n" +"="*40)
    print(key)
    print("-"*40)
    print(dataframe_collection[key])

The code above prints out the following result:

========================================
1102
----------------------------------------
        one       two     three
0  0.896120  0.742575  0.394026
1  0.414110  0.511570  0.268268
2  0.132031  0.142552  0.074510

========================================
1103
----------------------------------------
        one       two     three
0  0.558303  0.259172  0.373240
1  0.726139  0.283530  0.378284
2  0.776430  0.243089  0.283144

========================================
1101
----------------------------------------
        one       two     three
0  0.849145  0.198028  0.067342
1  0.620820  0.115759  0.809420
2  0.997878  0.884883  0.104158
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you Pedro! Is it necessesary to do the new_dataframe =A and dataframe_collection[month_day] = new_dataframe like this? I just did dataframe_collection[month_day] = A.
I am also curious why the print procedure prints the dataframes in a random order! In my case it does not matter, it's just a general question.
Hi Ana, what you did is correct. There is no need for the new_dataframe intermediate variable. I updated the answer to reflect that. As far as the random order in which the result is printed, this has to do with python's implementation of the dictionary. The dictionary key-value pairs are stored in a data structure called a hash table. This data structure is designed for very fast lookups and, as part of the algorithm to achieve this, the way the keys are stored in it can be random.
If your application requires you to iterate over the dictionary keys in a sorted fashion, I recommend that you import the collections module and use an OrderedDict rather than a plain dict to collect your dataframes: dataframe_collection = collections.OrderedDict()
8

df will have all the CSV files you need. df[0] to access first one

df=[]    
files = glob.glob("*.csv")
    for a in files:
        df.append( pd.read_csv(a))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.