Loop for column names in python

Question

I would like to write mean values from one dataframe (df1) to another (dfmaster ). Manually i can manage it, but i would like to automate the process in that way, that it will be read all the columns names from the df1 (as variable) and those variable will be used in the code below, to calculate mean of all columns from the dataframe (df1).

From this dataframe should be the mean of columns calculated:

import pandas as pd 
data = [[6.2, 10, 8], [6.4, 15, 13], [6.6, 14, 6]] 
df1 = pd.DataFrame(data, columns = ['Prozess233', 'Prozess234', 'Prozess235']) 
df1

This is the master dataframe where the means should be stored:

data = [['Prozess233','NaN', 'NaN','NaN'], ['Prozess234','NaN', 'NaN', 'NaN'], ['Prozess235','NaN', 'NaN', 'NaN']] 
dfmaster = pd.DataFrame(data, columns = ['Process', 'Mean', 'St.Dev', 'Max']) 
dfmaster

Here is the code to calculate the mean from one column of df1 and store the value in the master dataframe:

index = dfmaster.loc[dfmaster['Process'] == 'Prozess233'].index[0]
keep_col = ['Prozess233']
df1 = df1[keep_col]
df1 = df1[df1['Prozess233'].notna()]
meanPR = df1["Prozess233"].mean()
dfmaster.at[index, 'Mean'] = meanPR

This is the result:

I would like to have a loop, that the code above (to store the mean into the master dataframe) will perform for all the columns of the dataframe df1 - the mean for the "Prozess234" and for the "Prozess235".

I couldn`t figure out, how to store the column names (df1), in order to use them in loop. I am not sure, if this could be right approach.

df1.columns.to_list() will help you get column names to iterate/store. Check df1.mean() as well. — abhiieor
– abhiieor, Commented Oct 2, 2020 at 20:08
Thank you reply. I have manged it as well, but i am not sure what to do as next. How could be the list defined as variables, that i can use it for a loop. — Essegn
– Essegn, Commented Oct 2, 2020 at 20:16

Cameron Riddell · Accepted Answer · 2020-10-02 20:17:20Z

3

You can use agg to get specific aggregations for each column:

df1_summary = (df1.agg(["mean", "std", "max"])
               .rename(index={"mean": "Mean", "std": "St.Dev", "max": "Max"}))

print(df1_summary)
        Prozess233  Prozess234  Prozess235
Mean           6.4   13.000000    9.000000
St.Dev         0.2    2.645751    3.605551
Max            6.6   15.000000   13.000000

Then if you want to fill this into your dfmaster

dfmaster = dfmaster.set_index("Process") 
dfmaster.update(df1_summary.T)

print(dfmaster)
            Mean    St.Dev   Max
Process                         
Prozess233   6.4  0.200000   6.6
Prozess234  13.0  2.645751  15.0
Prozess235   9.0  3.605551  13.0

answered Oct 2, 2020 at 20:17

Cameron Riddell

13.8k14 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

noah · Accepted Answer · 2020-10-02 20:22:26Z

2

Instead of adding to the dfmaster one by one just create it using vectorized methods:

import pandas as pd 
data = [[6.2, 10, 8], [6.4, 15, 13], [6.6, 14, 6]] 
df1 = pd.DataFrame(data, columns = ['Prozess233', 'Prozess234', 'Prozess235']) 
dfmaster=pd.concat([df1.mean(), df1.std(), df1.max()], axis=1).reset_index()
dfmaster.columns = ['Process','Mean', 'St.Dev', 'Max']

#dfmaster
      Process  Mean    St.Dev   Max
0  Prozess233   6.4  0.200000   6.6
1  Prozess234  13.0  2.645751  15.0
2  Prozess235   9.0  3.605551  13.0

Also, depending on your needs consider checking out df1.describe()'s output:

       Prozess233  Prozess234  Prozess235
count         3.0    3.000000    3.000000
mean          6.4   13.000000    9.000000
std           0.2    2.645751    3.605551
min           6.2   10.000000    6.000000
25%           6.3   12.000000    7.000000
50%           6.4   14.000000    8.000000
75%           6.5   14.500000   10.500000
max           6.6   15.000000   13.000000

answered Oct 2, 2020 at 20:22

noah

2,79615 silver badges29 bronze badges

2 Comments

G. Anderson Over a year ago

+1 from me for recommending .describe(), it might also be worth mentioning that that output cna be saved as a dataframe and manipulated into the OPs expected result directly

noah Over a year ago

Good point. I'll add later when I've got some time if I remember. Cameron's answer sort of shows the process already of using update.

G. Anderson · Accepted Answer · 2020-10-02 20:15:24Z

While you can get the columns for a dataframe with df.columns, there's almost never a good reason to iterate over a pandas dataframe for simple mathematical calculations.

What you're after can be done with

df1.T.stack().groupby(level=0).agg({np.mean,np.std, max})

            mean    std         max
Prozess233  6.4     0.200000    6.6
Prozess234  13.0    2.645751    15.0
Prozess235  9.0     3.605551    13.0

To break it down further:

transpose the dataframe

dft=df1.T
dft

            0       1       2
Prozess233  6.2     6.4     6.6
Prozess234  10.0    15.0    14.0
Prozess235  8.0     13.0    6.0

stack the dataframe

dfs=dft.stack()
dfs

Prozess233  0     6.2
            1     6.4
            2     6.6
Prozess234  0    10.0
            1    15.0
            2    14.0
Prozess235  0     8.0
            1    13.0
            2     6.0
dtype: float64

group and aggregate

dfmaster=dfs.groupby(level=0).agg({np.mean,np.std, max})
dfmaster

            mean    std         max
Prozess233  6.4     0.200000    6.6
Prozess234  13.0    2.645751    15.0
Prozess235  9.0     3.605551    13.0

Collectives™ on Stack Overflow

Loop for column names in python

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related