1

I want to calculate standard deviation of a dataframe, and merge it, something like this

std = all_data.groupby(['Id'])[features].agg('std')
all_data = pd.merge(all_data, std.reset_index(), suffixes=["", "_std"], how='left', on=['Id'])

but there is nothing such thing as .agg('std')

0

1 Answer 1

3

Your solution working nice for me.

I think you need transform for avoid use merge for new Series with same size like original DataFrame:

all_data = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'Id':list('aaabbb')
})

#print (all_data)

features = ['B','C','D']
#new columns names
cols = ['{}_std'.format(x) for x in features]
#python 3.6+ solution with f-strings
#cols = [f'{x}_std' for x in features]

all_data[cols] = all_data.groupby(['Id'])[features].transform('std')
print (all_data)
   A  B  C  D  E Id    B_std  C_std     D_std
0  a  4  7  1  5  a  0.57735      1  2.000000
1  b  5  8  3  3  a  0.57735      1  2.000000
2  c  4  9  5  6  a  0.57735      1  2.000000
3  d  5  4  7  9  b  0.57735      1  3.785939
4  e  5  2  1  2  b  0.57735      1  3.785939
5  f  4  3  0  4  b  0.57735      1  3.785939
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.