13

I have a pandas DataFrame like this:

>>> df = pd.DataFrame({'MONTREGL':[10,10,2222,35,200,56,5555],'SINID':['aaa','aaa','aaa','bbb','bbb','ccc','ccc'],'EXTRA':[400,400,400,500,500,333,333]})
>>> df
   MONTREGL SINID EXTRA
0        10   aaa   400
1        10   aaa   400
2      2222   aaa   400
3        35   bbb   500
4       200   bbb   500
5        56   ccc   333
6      5555   ccc   333

I want to sum the column MONTREGL for each groupby SINID...

So I get 2242 for aaa and so on... ALSO I want to keep the value of column EXTRA.

This is the expected result:

   MONTREGL SINID EXTRA
0      2242   aaa   400
1       235   bbb   500
2      5611   ccc   333

Thanks for your help in advance!

5
  • 2
    Have you tried using groupby? Commented May 29, 2019 at 12:26
  • 1
    print(df.groupby(by=["SINID"])["MONTREGL"].sum()) ? Commented May 29, 2019 at 12:27
  • pandas.pydata.org/pandas-docs/stable/reference/api/… Commented May 29, 2019 at 12:27
  • can you do a print(df.info()) and share the resulting print Commented May 29, 2019 at 13:11
  • The solution proposed before by @yatu worked but not in production dfd= df.groupby('SINID').MONTREGL.sum().reset_index().assign(EXTRA=df.EXTRA.unique()) I get the error Length of values does not match length of index Commented May 29, 2019 at 13:19

4 Answers 4

17

I ended up using this script:

dff = df.groupby(["SINID","EXTRA"]).MONTREGL.sum().reset_index()

And it works in this test and production.

Sign up to request clarification or add additional context in comments.

Comments

6

I know this post is old, but this might be helpful for others:

Using loc: df.loc[df['SINID'] == aaa].MONTREGL.sum()

Using groupby: df.groupby('SINID')['MONTREGL'].sum()

A similar question is answered in the following link (check Alex Riley's response):

How do I sum values in a column that match a given condition using pandas?

Good luck,

Comments

4

The code below works for your example:

df1 = df.groupby(["SINID"]).sum()
df1['EXTRA'] = df.groupby(["SINID"]).mean()['EXTRA']

Result :

       MONTREGL  EXTRA
SINID                 
aaa        2242  400.0
bbb         235  500.0
ccc        5611  333.0

Comments

-1

my suggestion would be to filter you dataframe with conditions related to other columns then apply sum function,

it goes something like this.

import pandas as pd

df=pd.Dataframe({a:[1,2,3],b:[2001,2015,2019],c:[1,0,1]})

aux=df[df.c>0]

sa=aux.a.sum()

sb=aux.b.sum()

My syntax may not be correct ( i didnt run the code ) but it will probably work and lead you to your answer

Good luck.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.