Create an array of DataFrames in Python

Question

I want to write a piece of code to create multiple arrays of DataFrames with their names in the format of <name>_mmyy, where the mm is a month and yy is a year.

Here is an example:

df_0115, df_0215, df_0315, ... , df_1215
stat_0115, stat_0215, stat_0315, ... , stat_1215

better use dictionary df['0115'], df['0215'], stat['0115'], stat['0215'], etc, — furas
– furas, Commented Nov 25, 2015 at 3:14

Pedro M Duarte · Accepted Answer · 2015-11-25 15:56:02Z

35

I suggest that you create a dictionary to hold the DataFrames. That way you will be able to index them with a month-day key:

import datetime as dt 
import numpy as np
import pandas as pd

dates_list = [dt.datetime(2015,11,i+1) for i in range(3)]
month_day_list = [d.strftime("%m%d") for d in dates_list]

dataframe_collection = {} 

for month_day in month_day_list:
    new_data = np.random.rand(3,3)
    dataframe_collection[month_day] = pd.DataFrame(new_data, columns=["one", "two", "three"])

for key in dataframe_collection.keys():
    print("\n" +"="*40)
    print(key)
    print("-"*40)
    print(dataframe_collection[key])

The code above prints out the following result:

========================================
1102
----------------------------------------
        one       two     three
0  0.896120  0.742575  0.394026
1  0.414110  0.511570  0.268268
2  0.132031  0.142552  0.074510

========================================
1103
----------------------------------------
        one       two     three
0  0.558303  0.259172  0.373240
1  0.726139  0.283530  0.378284
2  0.776430  0.243089  0.283144

========================================
1101
----------------------------------------
        one       two     three
0  0.849145  0.198028  0.067342
1  0.620820  0.115759  0.809420
2  0.997878  0.884883  0.104158

edited Nov 25, 2015 at 15:56

answered Nov 25, 2015 at 3:18

Pedro M Duarte

28.2k7 gold badges46 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Ana Over a year ago

Thank you Pedro! Is it necessesary to do the new_dataframe =A and dataframe_collection[month_day] = new_dataframe like this? I just did dataframe_collection[month_day] = A.

Ana Over a year ago

I am also curious why the print procedure prints the dataframes in a random order! In my case it does not matter, it's just a general question.

Pedro M Duarte Over a year ago

Hi Ana, what you did is correct. There is no need for the new_dataframe intermediate variable. I updated the answer to reflect that. As far as the random order in which the result is printed, this has to do with python's implementation of the dictionary. The dictionary key-value pairs are stored in a data structure called a hash table. This data structure is designed for very fast lookups and, as part of the algorithm to achieve this, the way the keys are stored in it can be random.

Pedro M Duarte Over a year ago

If your application requires you to iterate over the dictionary keys in a sorted fashion, I recommend that you import the collections module and use an OrderedDict rather than a plain dict to collect your dataframes: dataframe_collection = collections.OrderedDict()

ChrisMM · Accepted Answer · 2019-12-15 19:22:05Z

8

df will have all the CSV files you need. df[0] to access first one

df=[]    
files = glob.glob("*.csv")
    for a in files:
        df.append( pd.read_csv(a))

edited Dec 15, 2019 at 19:22

ChrisMM

10.1k19 gold badges41 silver badges60 bronze badges

answered Dec 15, 2019 at 18:17

Malik Mussabeheen Noor

8647 silver badges11 bronze badges

Collectives™ on Stack Overflow

Create an array of DataFrames in Python

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related