Pandas- Find average based on multiple column value

Question

Goal is to get average(integer) of marks column based on name value. If id and name column appears with exact same value more than once, then the marks with corresponding name will be considered once. For e.g. average of x = (33+14+3)/3 = 16

Sample dataframe:

   id name  marks
0   1   x   33
1   1   x   33
2   2   y   9
3   3   x   14
4   4   y   55
5   4   y   55
6   5   x   3
7   6   z   31

Expected output:

   id name marks avg
0   1   x   33  16
1   1   x   33  16
2   2   y   9   32
3   3   x   14  16
4   4   y   55  32
5   4   y   55  32
6   5   x   3   16
7   6   z   31  31

I tried:

df["avg"] = df.groupby("name")["marks"].mean()

score 4 · Accepted Answer · 2021-12-08 19:47:29Z

4

Try this:

df = df.set_index('name').assign(avg=df.drop_duplicates(['name', 'marks']).groupby('name')['marks'].mean()).reset_index()

Output:

>>> df
  name  id  marks        avg
0    x   1     33  16.666667
1    x   1     33  16.666667
2    y   2      9  32.000000
3    x   3     14  16.666667
4    y   4     55  32.000000
5    y   4     55  32.000000
6    x   5      3  16.666667
7    z   6     31  31.000000

If you need it rounded, chain .astype(int) to .mean():

df = df.set_index('name').assign(avg=df.drop_duplicates(['name', 'marks']).groupby('name')['marks'].mean().astype(int)).reset_index()

Output:

>>> df
  name  id  marks  avg
0    x   1     33   16
1    x   1     33   16
2    y   2      9   32
3    x   3     14   16
4    y   4     55   32
5    y   4     55   32
6    x   5      3   16
7    z   6     31   31

edited Dec 8, 2021 at 19:47

answered Dec 8, 2021 at 19:32

user17242583

Sign up to request clarification or add additional context in comments.

4 Comments

Nick ODell Over a year ago

Nice answer. Also, it's possible to simplify df[~df.set_index(['name', 'marks']).index.duplicated()] into df.drop_duplicates(['name', 'marks']), and it still gets the same result.

user17242583 Over a year ago

Thank you @Nick. That simplification is neat - implemented! :)

Corralien Over a year ago

For the next time, if you have to use set_index then reset_index, don't forget map. Check my answer. +1

user17242583 Over a year ago

Okay @Corralien, that's great advice. I'll remember! :)

Corralien · Accepted Answer · 2021-12-08 19:48:03Z

2

Compute mean for each name after drop duplicates (id, name) and map result value on name column:

df['avg'] = df['name'].map(df.drop_duplicates(['id', 'name']).groupby('name')['marks'].mean())
print(df)

# Output:
   id name  marks        avg
0   1    x     33  16.666667
1   1    x     33  16.666667
2   2    y      9  32.000000
3   3    x     14  16.666667
4   4    y     55  32.000000
5   4    y     55  32.000000
6   5    x      3  16.666667
7   6    z     31  31.000000

answered Dec 8, 2021 at 19:48

Corralien

121k8 gold badges44 silver badges69 bronze badges

Comments

sammywemmy · Accepted Answer · 2021-12-08 20:43:14Z

1

One option, which uses the same drop_duplicates idea, without using a groupby, is to pivot the deduplicated data:

df.assign(avg = df.name.map(df.drop_duplicates().pivot('name', 'id', 'marks').mean(1)))
 
   id name  marks        avg
0   1    x     33  16.666667
1   1    x     33  16.666667
2   2    y      9  32.000000
3   3    x     14  16.666667
4   4    y     55  32.000000
5   4    y     55  32.000000
6   5    x      3  16.666667
7   6    z     31  31.000000

answered Dec 8, 2021 at 20:43

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Collectives™ on Stack Overflow

Pandas- Find average based on multiple column value

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related