How to sum up a dataframe column based on another column value

Question

so i have data about salaries where the columns are:

Index(['Id', 'EmployeeName', 'JobTitle', 'BasePay', 'OvertimePay', 'OtherPay',
   'Benefits', 'TotalPay', 'TotalPayBenefits', 'Year', 'Notes', 'Agency',
   'Status'],
  dtype='object')

where i created two extra columns to check whether their jobTitle has to do with police or fire fighter department

  def find_police(x):
    return "POLICE" in x
def find_fire(x):
    return 'FIRE' in x


# use apply to search for it in JobTitle
sf_sal["isPolice"] = sf_sal["JobTitle"].apply(find_police)
sf_sal["isFire"] = sf_sal["JobTitle"].apply(find_fire)

sf_sal[["JobTitle", "isPolice", "isFire"]]

So what i would like to do is compare the mean salaries of the police department and firefigher department, i get the ratio of police to firefighters

ratio_of_police = sf_sal["isPolice"].sum()
ratio_of_fire = sf_sal["isFire"].sum()

by using the basesalary columns, i want to sum up all the rows that have true next to the ispolice column and the same with firefighters

One way i attempted this was

sf_mask = sf_sal['isPolice'] == True
all_police = sf_sal[sf_mask]
sum_of_base_salary = all_police['BasePay'].sum()
print(sum_of_base_salary/ratio_of_police)
sf_mask = sf_sal['isFire'] == True
all_fire = sf_sal[sf_mask]
sum_of_base_salary = all_fire['BasePay'].sum()
print(sum_of_base_salary/ratio_of_police)

By looking at the comments another way would've been to use groupby statements

Corralien · Accepted Answer · 2023-01-11 12:14:58Z

1

You can try:

is_police_or_fire = df['JobTitle'].str.extract(r'(FIRE|POLICE)', expand=False)
out = df.groupby(is_police_or_fire)['BasePay'].mean()

answered Jan 11, 2023 at 12:14

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Darman Over a year ago

quick question following your suggestion for example once you do, if you followed that with the drop command would it drop those rows df.groupby(is_police_or_fire)

Corralien Over a year ago

What do you mean? Do you want to drop all rows in is_police_or_fire?

Darman Over a year ago

Yeah say for example if i wanted to do that instead can i follow that group by statement with a drop statement?

Corralien Over a year ago

No, you don't. IIUC, you should do something like df.drop(df[is_police_or_fire.notna()].index) or better df[is_police_or_fire.isna()]?

gman · Accepted Answer · 2023-01-11 11:56:56Z

0

You are looking for groupby operation.

answered Jan 11, 2023 at 11:56

gman

1661 silver badge7 bronze badges

1 Comment

Darman Over a year ago

I've had a look at groupby and it confused me so much, can you give a basic example, what i did try instead was creating a mask then making a new dataframe, then since that new dataframe only contains true, i can then sum the base_salary column

Collectives™ on Stack Overflow

How to sum up a dataframe column based on another column value

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related