1

I have a dataframe df which i need to groupby multiple column based on a condition.

df

user_id       area_id         group_id key year value     new
10835          48299            1      5   2011   0        ?
10835           48299           1      2   2010   0
10835           48299           2     102  2013   13100
10835           48299           2      5   2016   0
10836           48299           1      78  2017   67100
10836           48299           1      1   2012   54000
10836           48299           1      12  2018   0
10836           48752           1      7   2014   0
10836           48752           2     103  2015   5000
10837           48752           2     102  2016   5000
10837           48752           1      3   2017   0
10837           48752           1     103  2017   0
10837           49226           1      2   2011   4000
10837           49226           1     83   2011   4000
10838           49226           2     16   2011   0
10838           49226           1     75   2012   0
10838           49226           1      2   2012   4000
10838           49226           1      12  2013   1000
10839           49226           1      3   2015   6500
10839           49226           1     102  2016   7900
10839           49226           1     16   2017   0
10839           49226           2     6    2017   5500
22489           49226           2     89   2017   5000
22489           49226           1     102  2017   5000

my goal is to create a new column df['new'] Current solution:

df['new'] =df['user_id'].map(df[df['key'].eq(102)].groupby(['user_id', 'area_id', 'group_id', 'year'])['value'].sum())

I get NaN for all df['new'] values. I'm guessing is not possible to use the the map function to grouped multiple columns this way. Is there a proper way to accomplish this? Thanks in advance for tip to the right direction.

1
  • what should be the value of the new column ? Commented Mar 19, 2019 at 11:58

1 Answer 1

1

You can add as_index=False for new DataFrame:

df1 = (df[df['key'].eq(102)]
             .groupby(['user_id', 'area_id', 'group_id', 'year'], as_index=False)['value']
             .sum())
print (df1)
   user_id  area_id  group_id  year  value
0    10835    48299         2  2013  13100
1    10837    48752         2  2016   5000
2    10839    49226         1  2016   7900
3    22489    49226         1  2017   5000

Then if possible duplicated user_id first get unique rows by DataFrame.drop_duplicates, create Series by DataFrame.set_index and map:

df['new'] = df['user_id'].map(df1.drop_duplicates('user_id').set_index('user_id')['value'])
#if never duplicates
#df['new'] = df['user_id'].map(df1.set_index('user_id')['value'])
print (df)
    user_id  area_id  group_id  key  year  value      new
0     10835    48299         1    5  2011      0  13100.0
1     10835    48299         1    2  2010      0  13100.0
2     10835    48299         2  102  2013  13100  13100.0
3     10835    48299         2    5  2016      0  13100.0
4     10836    48299         1   78  2017  67100      NaN
5     10836    48299         1    1  2012  54000      NaN
6     10836    48299         1   12  2018      0      NaN
7     10836    48752         1    7  2014      0      NaN
8     10836    48752         2  103  2015   5000      NaN
9     10837    48752         2  102  2016   5000   5000.0
10    10837    48752         1    3  2017      0   5000.0
11    10837    48752         1  103  2017      0   5000.0
12    10837    49226         1    2  2011   4000   5000.0
13    10837    49226         1   83  2011   4000   5000.0
14    10838    49226         2   16  2011      0      NaN
15    10838    49226         1   75  2012      0      NaN
16    10838    49226         1    2  2012   4000      NaN
17    10838    49226         1   12  2013   1000      NaN
18    10839    49226         1    3  2015   6500   7900.0
19    10839    49226         1  102  2016   7900   7900.0
20    10839    49226         1   16  2017      0   7900.0
21    10839    49226         2    6  2017   5500   7900.0
22    22489    49226         2   89  2017   5000   5000.0
23    22489    49226         1  102  2017   5000   5000.0
Sign up to request clarification or add additional context in comments.

1 Comment

thank you very much, it works! Like always I learn a lot from you solutions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.