1

I'm learning to use python and pandas and I'm wondering how can I combine something from different rows based on the cell values in two columns. In this case, i want to combine scores from same department and within same id.

This is the simplified version of the DF I have:

 id  department  name  scoreA
abc       sales  eric       2
abc       sales  jack       3
abc   marketing sofia       6
abc   marketing  anna       7
zzz       sales  jack       1
zzz       sales  eric       8
zzz   marketing sofia      11
zzz   marketing  anna       1

And this is the DF I want:

 id  department totalScoreA
abc       sales           5
abc   marketing          13
zzz       sales           9
zzz   marketing          12


I have also a follow up question. What if there's two columns for counts and I want an average of those counts, but before averaging those values I want to multiply the scoreB by 2. Like this:

 id  department  name  scoreA  scoreB
abc       sales  eric       2      10
abc       sales  jack       3       6
abc   marketing sofia       6       8
abc   marketing  anna       7      10
zzz       sales  eric       8      10
zzz       sales  jack       2      10
zzz   marketing sofia      11       4
zzz   marketing  anna       1      10

And this is the DF I want:

 id  department totalScoreA  AverageScore((A+B*2)/2)
abc       sales           5                     18.5
abc   marketing          13                     24.5
zzz       sales          10                       25
zzz   marketing          12                       20


Update:

Hey thank you so much for the answers @jezrael! The first one worked as it should!

However I might've been little bit too vague for defining the second question. What I wanted is to get an "combined" mean of all the scoreB*2's ScoreA's for every group of every department. I give an example with values to clarify this:

From this:

 id  department   name  scoreA  scoreB
zzz   marketing  sofia       5       4
zzz   marketing   anna       2

To this:

The meanAB is (5+2+4*2)/3 (the number 3 comes from the count of the values). So how would i calculate that one because i wasn't able to do it, even with the help of your previous solution :/

 id  department  meanA  meanB  meanAB
zzz   marketing    3.5      4       5

1 Answer 1

6

First is aggregation sum with DataFrame.groupby:

df1 = df.groupby(['id', 'department'], as_index=False, sort=False)['scoreA'].sum()
print (df1)
    id department  scoreA
0  abc      sales       5
1  abc  marketing      13
2  zzz      sales      10
3  zzz  marketing      12

And second is first multiple column scoreB, added scoreA and used GroupBy.agg with dictionary of aggregate functions, here sum and mean:

df2 = (df.assign(scoreB = df['scoreB'] * 2 + df['scoreA'])
         .groupby(['id', 'department'], as_index=False, sort=False)
         .agg({'scoreA':'sum', 'scoreB':'mean'}))
print (df2)

    id department  scoreA  scoreB
0  abc      sales       5    18.5
1  abc  marketing      13    24.5
2  zzz      sales      10    25.0
3  zzz  marketing      12    20.0

EDIT:

print (df)
    id department   name  scoreA  scoreB
0  abc      sales   eric       2    10.0
1  abc      sales   jack       3     6.0
2  abc  marketing  sofia       6     8.0
3  abc  marketing   anna       7    10.0
4  abc  marketing   eric       8    10.0 <-changed data
5  zzz      sales   jack       2    10.0
6  zzz  marketing  sofia       5     4.0 <-changed data
7  zzz  marketing   anna       2     NaN <-changed data

Create new column Count by DataFrame.count function for get number of values with explude NaNs, then aggregate sum and divide for mean:

df2 = (df.assign(scoreB = df['scoreB'].mul(2).add(df['scoreA'], fill_value=0), 
                 Count = df[['scoreA','scoreB']].count(1))
         .groupby(['id', 'department'], as_index=False, sort=False)
         .sum())
print (df2)
    id department  scoreA  scoreB  Count
0  abc      sales       5    37.0      4
1  abc  marketing      21    77.0      6
2  zzz      sales       2    22.0      2
3  zzz  marketing       7    15.0      3

df2['scoreB'] /= df2.pop('Count')
print (df2)
    id department  scoreA     scoreB
0  abc      sales       5   9.250000
1  abc  marketing      21  12.833333
2  zzz      sales       2  11.000000
3  zzz  marketing       7   5.000000

Detail:

print (df.assign(scoreB = df['scoreB'].mul(2).add(df['scoreA'], fill_value=0), 
                 Count = df[['scoreA','scoreB']].count(1)))
    id department   name  scoreA  scoreB  Count
0  abc      sales   eric       2    22.0      2
1  abc      sales   jack       3    15.0      2
2  abc  marketing  sofia       6    22.0      2
3  abc  marketing   anna       7    27.0      2
4  abc  marketing   eric       8    28.0      2
5  zzz      sales   jack       2    22.0      2
6  zzz  marketing  sofia       5    13.0      2
7  zzz  marketing   anna       2     2.0      1
Sign up to request clarification or add additional context in comments.

1 Comment

I replied to your comment by adding more content to the original issue (i wanted to add more code). Don't know how i can tag you in there :/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.