1

Goal is to get average(integer) of marks column based on name value. If id and name column appears with exact same value more than once, then the marks with corresponding name will be considered once. For e.g. average of x = (33+14+3)/3 = 16

Sample dataframe:

   id name  marks
0   1   x   33
1   1   x   33
2   2   y   9
3   3   x   14
4   4   y   55
5   4   y   55
6   5   x   3
7   6   z   31

Expected output:

   id name marks avg
0   1   x   33  16
1   1   x   33  16
2   2   y   9   32
3   3   x   14  16
4   4   y   55  32
5   4   y   55  32
6   5   x   3   16
7   6   z   31  31

I tried:

df["avg"] = df.groupby("name")["marks"].mean()
0

3 Answers 3

4

Try this:

df = df.set_index('name').assign(avg=df.drop_duplicates(['name', 'marks']).groupby('name')['marks'].mean()).reset_index()

Output:

>>> df
  name  id  marks        avg
0    x   1     33  16.666667
1    x   1     33  16.666667
2    y   2      9  32.000000
3    x   3     14  16.666667
4    y   4     55  32.000000
5    y   4     55  32.000000
6    x   5      3  16.666667
7    z   6     31  31.000000

If you need it rounded, chain .astype(int) to .mean():

df = df.set_index('name').assign(avg=df.drop_duplicates(['name', 'marks']).groupby('name')['marks'].mean().astype(int)).reset_index()

Output:

>>> df
  name  id  marks  avg
0    x   1     33   16
1    x   1     33   16
2    y   2      9   32
3    x   3     14   16
4    y   4     55   32
5    y   4     55   32
6    x   5      3   16
7    z   6     31   31
Sign up to request clarification or add additional context in comments.

4 Comments

Nice answer. Also, it's possible to simplify df[~df.set_index(['name', 'marks']).index.duplicated()] into df.drop_duplicates(['name', 'marks']), and it still gets the same result.
Thank you @Nick. That simplification is neat - implemented! :)
For the next time, if you have to use set_index then reset_index, don't forget map. Check my answer. +1
Okay @Corralien, that's great advice. I'll remember! :)
2

Compute mean for each name after drop duplicates (id, name) and map result value on name column:

df['avg'] = df['name'].map(df.drop_duplicates(['id', 'name']).groupby('name')['marks'].mean())
print(df)

# Output:
   id name  marks        avg
0   1    x     33  16.666667
1   1    x     33  16.666667
2   2    y      9  32.000000
3   3    x     14  16.666667
4   4    y     55  32.000000
5   4    y     55  32.000000
6   5    x      3  16.666667
7   6    z     31  31.000000

Comments

1

One option, which uses the same drop_duplicates idea, without using a groupby, is to pivot the deduplicated data:

df.assign(avg = df.name.map(df.drop_duplicates().pivot('name', 'id', 'marks').mean(1)))
 
   id name  marks        avg
0   1    x     33  16.666667
1   1    x     33  16.666667
2   2    y      9  32.000000
3   3    x     14  16.666667
4   4    y     55  32.000000
5   4    y     55  32.000000
6   5    x      3  16.666667
7   6    z     31  31.000000

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.