1

I have a dataframe as below:

        |  Year | Cause of Death  |Gender| Total Case  |
        | 2016  |    Killed       |   M  |      3      |
        | 2016  |    Suicide      |   M  |      5      |
        | 2016  |    Killed       |   F  |      7      |
        | 2017  |    Killed       |   F  |      12     |
        | 2017  |    Killed       |   M  |      2      |
        | 2017  |    Suicide      |   F  |      5      |
        | 2017  |    Suicide      |   M  |      6      |

From this dataframe, I want to create a new datafarame as below :

    |Year|Cause of Death|Total Case|
    |2016|   Killed     |    10    | 
    |    |   Suicide    |  5       |
    |2017|   Killed     |  14      |
    |    |   Suicide    |  11      |

Any simple way to do this?

Thanks

4 Answers 4

1
df.groupby(['Year', 'Cause of Death'])['Total Case'].sum()

Year  Cause of Death
2016  Killed            10
      Suicide            5
2017  Killed            14
      Suicide           11
Name: Total Case, dtype: int64

From here, it's a matter of formatting:

df.groupby(['Year', 'Cause of Death']).sum()

                     Total Case
Year Cause of Death            
2016 Killed                  10
     Suicide                  5
2017 Killed                  14
     Suicide                 11

Or

df.groupby(['Year', 'Cause of Death']).sum().reset_index()

   Year Cause of Death  Total Case
0  2016         Killed          10
1  2016        Suicide           5
2  2017         Killed          14
3  2017        Suicide          11
Sign up to request clarification or add additional context in comments.

Comments

0

Pandas DataFrame's come with a groupby function that acheieves this. It looks like you don't care about the Gender column and instead just want to group by Year and Cause of Death.

g = df[['Year', 'Cause of Death', 'Total Cases']].groupby(['Year', 'Cause of Death'])
g.sum()

#                      Total Cases
# Year Cause of Death             
# 2016 Killed                   10
#      Suicide                   5
# 2017 Killed                   14
#      Suicide                  11 

First line selects only the columns you are interested in, then calls groupby on the columns you want to group. This returns a new object that has a function called sum that will sum the values in each group.

Comments

0

You can try with groupby and reset_index:

import pandas as pd
df = pd.read_csv('test_1.csv')
df

df is :

    Year    Cause of Death  Gender  Total Case
0   2016    Killed            M      3
1   2016    Suicide           M      5
2   2016    Killed            F      7
3   2017    Killed            F      12
4   2017    Killed            M      2
5   2017    Suicide           F      5
6   2017    Suicide           M      6

Then apply this:

new_df = df['Total Case'].groupby([df['Year'], df['Cause of Death']]).sum()
new_df = new_df.reset_index()
new_df 

new_df will be:

    Year    Cause of Death  Total Case
0   2016    Killed          10
1   2016    Suicide         5
2   2017    Killed          14
3   2017    Suicide         11

Comments

0

Use the method "groupby" from Pandas.

grouped = df.groupby(['Year', 'Cause of Death'])

Then to get the sum in total cases use this :

grouped.sum()

This will give your desired output

|Year|Cause of Death|Total Case|
|2016|   Killed     |    10    | 
|    |   Suicide    |  5       |
|2017|   Killed     |  14      |
|    |   Suicide    |  11      |

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.