Create multiple dataframes in loop

Question

I have a list, with each entry being a company name

companies = ['AA', 'AAPL', 'BA', ....., 'YHOO']

I want to create a new dataframe for each entry in the list.

Something like

(pseudocode)

for c in companies:
     c = pd.DataFrame()

I have searched for a way to do this but can't find it. Any ideas?

You want each company in its own column, or all companies in one column? — Scott
– Scott, Commented Jun 4, 2015 at 4:58
If you want a DataFrame for each company, what data will each one contain? — Alexander
– Alexander, Commented Jun 4, 2015 at 4:59

holdenweb · Accepted Answer · 2023-05-19 10:34:47Z

160

Just to underline my comment to @maxymoo's answer, it's almost invariably a bad idea ("code smell") to add names dynamically to a Python namespace. There are a number of reasons, the most salient being:

Created names might easily conflict with variables already used by your logic.
Since the names are dynamically created, you typically also end up using dynamic techniques to retrieve the data.

This is why dicts were included in the language. The correct way to proceed is:

d = {}
for name in companies:
    d[name] = pd.DataFrame()

Nowadays you can write a single dict comprehension expression to do the same thing, but some people find it less readable:

d = {name: pd.DataFrame() for name in companies}

Once d is created the DataFrame for company x can be retrieved as d[x], so you can look up a specific company quite easily. To operate on all companies you would typically use a loop like:

for name, df in d.items():
    # operate on DataFrame 'df' for company 'name'

In Python 2 you were better writing

for name, df in d.iteritems():

because this avoids instantiating the list of (name, df) tuples that .items() creates in the older version. That's now largely of historical interest, though there will of course be Python 2 applications still extant and requiring (hopefully occasional) maintenance.

edited May 19, 2023 at 10:34

answered Jun 4, 2015 at 8:39

holdenweb

37.8k7 gold badges62 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

maxymoo Over a year ago

Good point, I hadn't thought of that, but you're absolutely right.

Moondra Over a year ago

This answer taught me a lot.

Bowen Liu Over a year ago

I don't understand why the other answer was accepted while this one is clearly better.

holdenweb Over a year ago

The original questioner has a reputation score of 67, so probably has the answer they wanted (perhaps it went into production somewhere!) and doesn't use Stackoverflow any more. It's possibly unfortunate that the accepted answer uses exec, but in the larger scheme of things it's a small concern - though thanks for saying this one is better. Stackoverflow is not a competition for me, but rather a way of providing information for which there is a visible need.

maxymoo · Accepted Answer · 2015-06-04 05:00:37Z

23

You can do this (although obviously use exec with extreme caution if this is going to be public-facing code)

for c in companies:
     exec('{} = pd.DataFrame()'.format(c))

answered Jun 4, 2015 at 5:00

maxymoo

36.7k12 gold badges97 silver badges121 bronze badges

6 Comments

Luis Ibáñez Herrera Over a year ago

In the ipython notebook I get File "<string>", line 1 S.1 = pd.DataFrame() ^ SyntaxError: invalid syntax

Luis Ibáñez Herrera Over a year ago

It does work if I dont use a loop and just execute the exec statatement with a randmo c value, like format('test')

maxymoo Over a year ago

The error message is saying that "S.1" is not a valid variable name since a variable can't contain punctuation. You could try and fix this by changing the code to format(c.replace('.','')).

Luis Ibáñez Herrera Over a year ago

Yes, I have some company names with '.' in them. Now it works!, thanks :)

holdenweb Over a year ago

Dynamically creating names in a Python namespace is almost invariably a bad idea. It would be much more sensible to use a dict d and write d[c] = pd.DataFrame(). Read this answer, for example, to start to understand why it's a bad idea.

|

Chandan · Accepted Answer · 2021-01-18 12:02:24Z

8

Below is the code for dynamically creating data frames in loop:

companies = ['AA', 'AAPL', 'BA', ....., 'YHOO']

for eachCompany in companies:
    #Dynamically create Data frames
    vars()[eachCompany] = pd.DataFrame()

For difference between vars(),locals() and globals() refer to the below link:

What's the difference between globals(), locals(), and vars()?

answered Jan 18, 2021 at 12:02

Chandan

7329 silver badges20 bronze badges

Comments

holdenweb · Accepted Answer · 2018-10-04 20:59:05Z

6

Adding to the above great answers. The above will work flawless if you need to create empty data frames but if you need to create multiple dataframe based on some filtering:

Suppose the list you got is a column of some dataframe and you want to make multiple data frames for each unique companies fro the bigger data frame:-

First take the unique names of the companies:-
```
compuniquenames = df.company.unique()
```

Create a data frame dictionary to store your data frames

companydict = {elem : pd.DataFrame() for elem in compuniquenames}

The above two are already in the post:

for key in DataFrameDict.keys():
    DataFrameDict[key] = df[:][df.company == key]

The above will give you a data frame for all the unique companies with matching record.

edited Oct 4, 2018 at 20:59

holdenweb

37.8k7 gold badges62 silver badges80 bronze badges

answered Jul 25, 2018 at 19:50

ak3191

6016 silver badges14 bronze badges

4 Comments

ak3191 Over a year ago

Thanks for editing @zx485. Can you help me with one question:- How i can split the dictionary back into multiple dataframes based on all the unique names of the comapny?

zx485 Over a year ago

I'm sorry, but I'm no Python guy.

pink.slash Over a year ago

I think something is wrong in your code. Last part of code should be: for key in companydict.keys(): ` companydict[key] = df[:][df.company == key]` But in any case I do not see exactly what is the output of this

ak3191 Over a year ago

@pink.slash for me the exact code worked but if there another use case i would be happy to have a look.

Dave2e · Accepted Answer · 2021-12-15 00:31:08Z

3

you can do this way:

for xxx in yyy:
   globals()[f'dataframe_{xxx}'] = pd.Dataframe(xxx)

edited Dec 15, 2021 at 0:31

Dave2e

24.3k18 gold badges46 silver badges57 bronze badges

answered Dec 14, 2021 at 18:29

Joao Nogaroli

412 bronze badges

1 Comment

Regi Mathew Over a year ago

This is an important addition to the solutions discussed above.

Aku · Accepted Answer · 2021-10-22 09:13:55Z

The following is reproducable -> so lets say you have a list with the df/company names:

companies = ['AA', 'AAPL', 'BA', 'YHOO']

you probably also have data, presumably also a list? (or rather list of lists) like:

 content_of_lists = [
 [['a', '1'], ['b', '2']],
 [['c', '3'], ['d', '4']],
 [['e', '5'], ['f', '6']],
 [['g', '7'], ['h', '8']]
]

in this special example the df´s should probably look very much alike, so this does not need to be very complicated:

dic={}
for n,m in zip(companies, range(len(content_of_lists))):
   dic["df_{}".format(n)] = pd.DataFrame(content_of_lists[m]).rename(columns = {0: "col_1", 1:"col_2"})

Here you would have to use dic["df_AA"] to get to the dataframe inside the dictionary. But Should you require more "distinct" naming of the dataframes I think you would have to use for example if-conditions, like:

dic={}
    for n,m in zip(companies, range(len(content_of_lists))):
if n == 'AA':
    special_naming_1 = pd.DataFrame(content_of_lists[m]).rename(columns = {0:     
    "col_1", 1:"col_2"})
elif n == 'AAPL':
    special_naming_2 ...

It is a little more effort but it allows you to grab the dataframe object in a more conventional way by just writing special_naming_1 instead of dic['df_AA'] and gives you more controll over the dataframes names and column names if that´s important.

marcin2x4 · Accepted Answer · 2024-04-02 10:29:11Z

0

Old topic but thought I share a view on this.

I understand goal to have DFs created on the fly, but wouldn't it be better to simply call a method with a variable matching desired table?

Yes, few extra lines of code but easier to debug probably.

import pandas as pd

companies = ['AA', 'AAPL', 'BA', 'YHOO']
dummy_data = {'col1': [1, 2], 'col2': [3, 4]}

df_aa = companies[0] = pd.DataFrame(data=dummy_data)
df_ba = companies[2] = pd.DataFrame(data=dummy_data)

answered Apr 2, 2024 at 10:29

marcin2x4

1,5093 gold badges29 silver badges56 bronze badges

Collectives™ on Stack Overflow

Create multiple dataframes in loop

7 Answers 7

4 Comments

6 Comments

Comments

4 Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

4 Comments

6 Comments

Comments

4 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related