3
df1:
            A    B
0  2002-01-13  3.9
1  2002-01-13  1.9
2  2002-01-14  8.0
3  2002-01-14  9.0

I want to create a new column df1["C"] with means of B values, per each A group.

Output should be:

            A    B     C
0  2002-01-13  3.9   2.9
1  2002-01-13  1.9   2.9
2  2002-01-14  8.0   8.5
3  2002-01-14  9.0   8.5

And now I want to assign C values to each A group, to another df2.

df2:
            A        D
0  2002-01-13   Joseph
1  2002-01-13     Emma
2  2002-01-13  Michael
3  2002-01-14     Anna
4  2002-01-14   Yvonne
5  2002-01-14  Anthony

Output should be:

            A        D     E
0  2002-01-13   Joseph   2.9
1  2002-01-13     Emma   2.9
2  2002-01-13  Michael   2.9
3  2002-01-14     Anna   8.5
4  2002-01-14   Yvonne   8.5
5  2002-01-14  Anthony   8.5

I´ve tried:

df1["C"] = df1.groupby("A")["B"].mean()

5 Answers 5

3

You don't have to add a column to df1, you can directly map the values from the groupby df1 to df2.

df2['E'] = df2['A'].map(df1.groupby('A').B.mean())


    A           D       E
0   2002-01-13  Joseph  2.9
1   2002-01-13  Emma    2.9
2   2002-01-13  Michael 2.9
3   2002-01-14  Anna    8.5
4   2002-01-14  Yvonne  8.5
5   2002-01-14  Anthony 8.5
Sign up to request clarification or add additional context in comments.

2 Comments

I'd use assign but this is my answer.
@piRSquared, need to start using assign :)
2

First question transform

df1['C'] = df1.groupby('A').B.transform('mean')

Second using map (Notice I am using the df1 directly cause I adding drop_duplicates )

df2['E']=df2.A.map(df1.drop_duplicates('A').set_index('A').C)

2 Comments

I think df2.A.map(df1.set_index('A')['C'].to_dict()) would be more clean, You don't need drop_duplicates as dictionary will take care of it. +1
@Vaishali that is true:-)
0

You can use

df['C'] = df['A'].replace(df.groupby('A')['B'].mean().to_dict())

2 Comments

This is essentially the same as @Vaishali's answer but uses replace instead. I'd argue that you'd want to use map instead.
Three other answers are posted while I was drafting my answer :). I agree that either transform or map is better than mine.
0

Part 1

df['C'] = df.groupby('A')['B'].transform('mean')

The reason your code does not work is your groupby object returns a series indexed by A.

Parts 1 & 2

You could perform your transformations by mapping A to a precalculated groupby object in each dataframe.

s = df1.groupby('A')['B'].mean()

df['C'] = df['A'].map(s)
df2['E'] = df2['A'].map(s)

Comments

0

Posting since others did not mention using pd.merge or DataFrame.join.

If only the final output is required:

pd.merge(df2, df1.groupby('A', as_index=False).B.agg('mean').rename(columns={'B':'E'}), on='A')
#outputs:
            A        D     E
0  2002-01-13   Joseph   2.9
1  2002-01-13     Emma   2.9
2  2002-01-13  Michael   2.9
3  2002-01-14     Anna   8.5
4  2002-01-14   Yvonne   8.5
5  2002-01-14  Anthony   8.5

I have a hunch that the join based solution will be faster than the map based solutions given large data frames.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.