Creating new variable by aggregation in python

Question

I'm pretty new to python and pandas, and know only the basics. Nowadays I'm conducting a research and I need your kind help.

Let’s say I have data on births, containing 2 variables: Date and Country.

Date    Country
1.1.20  USA
1.1.20  USA
1.1.20  Italy
1.1.20  England
2.1.20  Italy
2.1.20  Italy
3.1.20  USA
3.1.20  USA

Now I want to create a third variable, let’s call him ‘Births’, which contains the number of births in country at a date. In other words, I want to stick to just one row for each date+country combination by aggregating the number of countries in each date, so I end up with something like this:

Date    Country Births
1.1.20  USA     2
1.1.20  Italy   1
1.1.20  England 1
2.1.20  Italy   2
3.1.20  USA     2

I’ve tried many things, but nothing seemed to work. Any help will be much appreciated.

Thanks, Eran

mgc · Accepted Answer · 2020-08-18 19:39:24Z

1

I guess you can use the groupby method of your DataFrame, then use the size method to count the number of individuals in each group :

df.groupby(by=['Date', 'Country']).size().reset_index(name='Births')

Output:

     Date  Country  Births
0  1.1.20  England       1
1  1.1.20    Italy       1
2  1.1.20      USA       2
3  2.1.20    Italy       2
4  3.1.20      USA       2

Also, the pandas documentation has several examples related to group-by operations : https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html.

answered Aug 18, 2020 at 19:39

mgc

5,4531 gold badge28 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Creating new variable by aggregation in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related