0

I would like to write mean values from one dataframe (df1) to another (dfmaster ). Manually i can manage it, but i would like to automate the process in that way, that it will be read all the columns names from the df1 (as variable) and those variable will be used in the code below, to calculate mean of all columns from the dataframe (df1).

From this dataframe should be the mean of columns calculated:

import pandas as pd 
data = [[6.2, 10, 8], [6.4, 15, 13], [6.6, 14, 6]] 
df1 = pd.DataFrame(data, columns = ['Prozess233', 'Prozess234', 'Prozess235']) 
df1 

enter image description here

This is the master dataframe where the means should be stored:

data = [['Prozess233','NaN', 'NaN','NaN'], ['Prozess234','NaN', 'NaN', 'NaN'], ['Prozess235','NaN', 'NaN', 'NaN']] 
dfmaster = pd.DataFrame(data, columns = ['Process', 'Mean', 'St.Dev', 'Max']) 
dfmaster 

enter image description here

Here is the code to calculate the mean from one column of df1 and store the value in the master dataframe:

index = dfmaster.loc[dfmaster['Process'] == 'Prozess233'].index[0]
keep_col = ['Prozess233']
df1 = df1[keep_col]
df1 = df1[df1['Prozess233'].notna()]
meanPR = df1["Prozess233"].mean()
dfmaster.at[index, 'Mean'] = meanPR

This is the result:

enter image description here

I would like to have a loop, that the code above (to store the mean into the master dataframe) will perform for all the columns of the dataframe df1 - the mean for the "Prozess234" and for the "Prozess235".

I couldn`t figure out, how to store the column names (df1), in order to use them in loop. I am not sure, if this could be right approach.

2
  • df1.columns.to_list() will help you get column names to iterate/store. Check df1.mean() as well. Commented Oct 2, 2020 at 20:08
  • Thank you reply. I have manged it as well, but i am not sure what to do as next. How could be the list defined as variables, that i can use it for a loop. Commented Oct 2, 2020 at 20:16

3 Answers 3

3

You can use agg to get specific aggregations for each column:

df1_summary = (df1.agg(["mean", "std", "max"])
               .rename(index={"mean": "Mean", "std": "St.Dev", "max": "Max"}))

print(df1_summary)
        Prozess233  Prozess234  Prozess235
Mean           6.4   13.000000    9.000000
St.Dev         0.2    2.645751    3.605551
Max            6.6   15.000000   13.000000

Then if you want to fill this into your dfmaster

dfmaster = dfmaster.set_index("Process") 
dfmaster.update(df1_summary.T)

print(dfmaster)
            Mean    St.Dev   Max
Process                         
Prozess233   6.4  0.200000   6.6
Prozess234  13.0  2.645751  15.0
Prozess235   9.0  3.605551  13.0
Sign up to request clarification or add additional context in comments.

Comments

2

Instead of adding to the dfmaster one by one just create it using vectorized methods:

import pandas as pd 
data = [[6.2, 10, 8], [6.4, 15, 13], [6.6, 14, 6]] 
df1 = pd.DataFrame(data, columns = ['Prozess233', 'Prozess234', 'Prozess235']) 
dfmaster=pd.concat([df1.mean(), df1.std(), df1.max()], axis=1).reset_index()
dfmaster.columns = ['Process','Mean', 'St.Dev', 'Max']

#dfmaster
      Process  Mean    St.Dev   Max
0  Prozess233   6.4  0.200000   6.6
1  Prozess234  13.0  2.645751  15.0
2  Prozess235   9.0  3.605551  13.0

Also, depending on your needs consider checking out df1.describe()'s output:

       Prozess233  Prozess234  Prozess235
count         3.0    3.000000    3.000000
mean          6.4   13.000000    9.000000
std           0.2    2.645751    3.605551
min           6.2   10.000000    6.000000
25%           6.3   12.000000    7.000000
50%           6.4   14.000000    8.000000
75%           6.5   14.500000   10.500000
max           6.6   15.000000   13.000000

2 Comments

+1 from me for recommending .describe(), it might also be worth mentioning that that output cna be saved as a dataframe and manipulated into the OPs expected result directly
Good point. I'll add later when I've got some time if I remember. Cameron's answer sort of shows the process already of using update.
1

While you can get the columns for a dataframe with df.columns, there's almost never a good reason to iterate over a pandas dataframe for simple mathematical calculations.

What you're after can be done with

df1.T.stack().groupby(level=0).agg({np.mean,np.std, max})

            mean    std         max
Prozess233  6.4     0.200000    6.6
Prozess234  13.0    2.645751    15.0
Prozess235  9.0     3.605551    13.0

To break it down further:

transpose the dataframe

dft=df1.T
dft

            0       1       2
Prozess233  6.2     6.4     6.6
Prozess234  10.0    15.0    14.0
Prozess235  8.0     13.0    6.0

stack the dataframe

dfs=dft.stack()
dfs

Prozess233  0     6.2
            1     6.4
            2     6.6
Prozess234  0    10.0
            1    15.0
            2    14.0
Prozess235  0     8.0
            1    13.0
            2     6.0
dtype: float64

group and aggregate

dfmaster=dfs.groupby(level=0).agg({np.mean,np.std, max})
dfmaster

            mean    std         max
Prozess233  6.4     0.200000    6.6
Prozess234  13.0    2.645751    15.0
Prozess235  9.0     3.605551    13.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.