0

I have a csv file with different groups identified by an ID, something like:

ID,X
aaa,3
aaa,5
aaa,4
bbb,50
bbb,54
bbb,52

I need to:

  • calculate the mean of x in each group;
  • divide each value of x by the mean of x for that specific group.

So, in my example above, the mean in the 'aaa' group is 4, while in 'bbb' it's 52. I need to obtain a new dataframe with a third column, where in each row I have the original value of x divided by the group average:

ID,X,x/group_mean
aaa,3,3/4
aaa,5,5/4
aaa,4,4/4
bbb,50,50/52
bbb,54,54/52
bbb,52,52/52

I can group the dataframe and calcualte each group's mean by:

    df_data = pd.read_csv('test.csv', index_col=0)
    df_grouped = df_data.groupby('ID')
    for group_name, group_content in df_grouped:
        mean_x_group = group_content['x'].mean()
        print(f'mean = {mean_x_group}')

but how do I add the third column?

3 Answers 3

2

Use Groupby.transform:

In [1874]: df['mean']  = df.groupby('ID').transform('mean')

In [1879]: df['newcol'] = df.X.div(df['mean'])

In [1880]: df
Out[1880]: 
    ID   X  mean    newcol
0  aaa   3     4  0.750000
1  aaa   5     4  1.250000
2  aaa   4     4  1.000000
3  bbb  50    52  0.961538
4  bbb  54    52  1.038462
5  bbb  52    52  1.000000
Sign up to request clarification or add additional context in comments.

1 Comment

It is good to be reminded of .tranform in groupby. I have often just use .agg just to find a way to broadcast the mean to all rows. +Up for that.
1

The idea being in a neat one-liner:

df['new_column'] = df.apply(lambda row: row.X/df.loc[df.ID==row.ID, 'X'].mean(), axis=1)

3 Comments

This is not very neat though!!
It is neat in the sense that it is closer to the spoken language, so to say. "For each row take X and divide it by its mean". Regardless of the neatness or not, I think it's still good to put more approaches to the problem to inspire other ideas for other problems as well.
Sure I agree with that.
0

One liner code to do that

# divide X with mean of X group by ID

df['group_mean'] = df.X / df.groupby('ID').transform('mean').X

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.