Grouping data in Python DataFrame

Question

I have a dataframe as below:

        |  Year | Cause of Death  |Gender| Total Case  |
        | 2016  |    Killed       |   M  |      3      |
        | 2016  |    Suicide      |   M  |      5      |
        | 2016  |    Killed       |   F  |      7      |
        | 2017  |    Killed       |   F  |      12     |
        | 2017  |    Killed       |   M  |      2      |
        | 2017  |    Suicide      |   F  |      5      |
        | 2017  |    Suicide      |   M  |      6      |

From this dataframe, I want to create a new datafarame as below :

    |Year|Cause of Death|Total Case|
    |2016|   Killed     |    10    | 
    |    |   Suicide    |  5       |
    |2017|   Killed     |  14      |
    |    |   Suicide    |  11      |

Any simple way to do this?

Thanks

piRSquared · Accepted Answer · 2017-09-26 05:31:04Z

1

df.groupby(['Year', 'Cause of Death'])['Total Case'].sum()

Year  Cause of Death
2016  Killed            10
      Suicide            5
2017  Killed            14
      Suicide           11
Name: Total Case, dtype: int64

From here, it's a matter of formatting:

df.groupby(['Year', 'Cause of Death']).sum()

                     Total Case
Year Cause of Death            
2016 Killed                  10
     Suicide                  5
2017 Killed                  14
     Suicide                 11

Or

df.groupby(['Year', 'Cause of Death']).sum().reset_index()

   Year Cause of Death  Total Case
0  2016         Killed          10
1  2016        Suicide           5
2  2017         Killed          14
3  2017        Suicide          11

edited Sep 26, 2017 at 5:31

answered Sep 26, 2017 at 5:21

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

TheF1rstPancake · Accepted Answer · 2017-09-26 05:22:44Z

0

Pandas DataFrame's come with a groupby function that acheieves this. It looks like you don't care about the Gender column and instead just want to group by Year and Cause of Death.

g = df[['Year', 'Cause of Death', 'Total Cases']].groupby(['Year', 'Cause of Death'])
g.sum()

#                      Total Cases
# Year Cause of Death             
# 2016 Killed                   10
#      Suicide                   5
# 2017 Killed                   14
#      Suicide                  11

First line selects only the columns you are interested in, then calls groupby on the columns you want to group. This returns a new object that has a function called sum that will sum the values in each group.

answered Sep 26, 2017 at 5:22

TheF1rstPancake

2,37819 silver badges18 bronze badges

Comments

Tiny.D · Accepted Answer · 2017-09-26 05:24:19Z

You can try with groupby and reset_index:

import pandas as pd
df = pd.read_csv('test_1.csv')
df

df is :

    Year    Cause of Death  Gender  Total Case
0   2016    Killed            M      3
1   2016    Suicide           M      5
2   2016    Killed            F      7
3   2017    Killed            F      12
4   2017    Killed            M      2
5   2017    Suicide           F      5
6   2017    Suicide           M      6

Then apply this:

new_df = df['Total Case'].groupby([df['Year'], df['Cause of Death']]).sum()
new_df = new_df.reset_index()
new_df

new_df will be:

    Year    Cause of Death  Total Case
0   2016    Killed          10
1   2016    Suicide         5
2   2017    Killed          14
3   2017    Suicide         11

Ankush Bhatia · Accepted Answer · 2017-09-26 05:25:49Z

0

Use the method "groupby" from Pandas.

grouped = df.groupby(['Year', 'Cause of Death'])

Then to get the sum in total cases use this :

grouped.sum()

This will give your desired output

|Year|Cause of Death|Total Case|
|2016|   Killed     |    10    | 
|    |   Suicide    |  5       |
|2017|   Killed     |  14      |
|    |   Suicide    |  11      |

answered Sep 26, 2017 at 5:25

Ankush Bhatia

1631 gold badge1 silver badge7 bronze badges

Collectives™ on Stack Overflow

Grouping data in Python DataFrame

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related