Pandas groupBy multiple columns and aggregation

Question

In dataframe have 4 columns col_A,col_B,col_C,col_D.Need to group the columns(col_A,col_B,col_C) and aggregate mean by col_D. Below is the code snippet I tried and it worked

df.groupby(['col_A','col_B','col_C']).agg({'col_D':'mean'}).reset_index()

But in addition to the above result, also require the group by count of ('col_A','col_B','col_C') along with aggregation. Any help on this please.

df.groupby(['col_A','col_B','col_C'])['col_D'].agg(['mean', 'count']).reset_index()? — Henry Ecker
– Henry Ecker ♦, Commented Sep 3, 2021 at 0:12

Code Different · Accepted Answer · 2021-09-03 02:20:31Z

3

Using Named Aggregation:

result = (
    df.groupby(['col_A', 'col_B', 'col_C'], as_index=False)
      .agg(mean=('col_D', 'mean'), count=('col_D', 'count'))
)

For the count columns, you have 2 choices in choosing the aggregate function:

count=('col_D', 'count') will ignore any NaN value in col_D
count=('col_D', 'size') will include NaN values in col_D

answered Sep 3, 2021 at 2:20

Code Different

93.4k16 gold badges154 silver badges175 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Soudipta Dutta · Accepted Answer · 2025-04-18 05:35:17Z

Better to create an example for beginners.

import pandas as pd

data = {'col_A': ['A', 'A', 'B', 'B', 'A', 'B'],
        'col_B': ['X', 'Y', 'X', 'Y', 'X', 'X'],
        'col_C': [1, 2, 1, 2, 1, 1],
        'col_D': [10, 20, 30, 40, 15, 25]}
df = pd.DataFrame(data)
'''
  col_A col_B  col_C  col_D
0     A     X      1     10
1     A     Y      2     20
2     B     X      1     30
3     B     Y      2     40
4     A     X      1     15
5     B     X      1     25
'''

result = df.groupby(['col_A', 'col_B', 'col_C']).agg(
    mean_col_D = ('col_D', 'mean'), 
    count = ('col_D', 'count')       
).reset_index()

'''
  col_A col_B  col_C  mean_col_D  count
0     A     X      1        12.5      2
1     A     Y      2        20.0      1
2     B     X      1        27.5      2
3     B     Y      2        40.0      1
'''

Soudipta Dutta · Accepted Answer · 2025-04-26 05:43:55Z

Same thing if you are using Polars :

import polars as pl


df = pl.DataFrame({
    'col_A': ['A', 'A', 'B', 'B', 'A', 'B'],
    'col_B': ['X', 'Y', 'X', 'Y', 'X', 'X'],
    'col_C': [1, 2, 1, 2, 1, 1],
    'col_D': [10, 20, 30, 40, 15, 25]
})

# GroupBy and aggregate
res = df.group_by(['col_A', 'col_B', 'col_C']).agg([
pl.col('col_D').mean().alias('mean_col_D'),
pl.col('col_D').count().alias('count_col_D')
    
])
print(res)
'''
shape: (4, 5)
┌───────┬───────┬───────┬────────────┬─────────────┐
│ col_A ┆ col_B ┆ col_C ┆ mean_col_D ┆ count_col_D │
│ ---   ┆ ---   ┆ ---   ┆ ---        ┆ ---         │
│ str   ┆ str   ┆ i64   ┆ f64        ┆ u32         │
╞═══════╪═══════╪═══════╪════════════╪═════════════╡
│ A     ┆ X     ┆ 1     ┆ 12.5       ┆ 2           │
│ B     ┆ Y     ┆ 2     ┆ 40.0       ┆ 1           │
│ A     ┆ Y     ┆ 2     ┆ 20.0       ┆ 1           │
│ B     ┆ X     ┆ 1     ┆ 27.5       ┆ 2           │
└───────┴───────┴───────┴────────────┴─────────────┘
'''

Collectives™ on Stack Overflow

Pandas groupBy multiple columns and aggregation

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related