0

so i have data about salaries where the columns are:

Index(['Id', 'EmployeeName', 'JobTitle', 'BasePay', 'OvertimePay', 'OtherPay',
   'Benefits', 'TotalPay', 'TotalPayBenefits', 'Year', 'Notes', 'Agency',
   'Status'],
  dtype='object')

where i created two extra columns to check whether their jobTitle has to do with police or fire fighter department

  def find_police(x):
    return "POLICE" in x
def find_fire(x):
    return 'FIRE' in x


# use apply to search for it in JobTitle
sf_sal["isPolice"] = sf_sal["JobTitle"].apply(find_police)
sf_sal["isFire"] = sf_sal["JobTitle"].apply(find_fire)

sf_sal[["JobTitle", "isPolice", "isFire"]]

So what i would like to do is compare the mean salaries of the police department and firefigher department, i get the ratio of police to firefighters

ratio_of_police = sf_sal["isPolice"].sum()
ratio_of_fire = sf_sal["isFire"].sum()

by using the basesalary columns, i want to sum up all the rows that have true next to the ispolice column and the same with firefighters

One way i attempted this was

sf_mask = sf_sal['isPolice'] == True
all_police = sf_sal[sf_mask]
sum_of_base_salary = all_police['BasePay'].sum()
print(sum_of_base_salary/ratio_of_police)
sf_mask = sf_sal['isFire'] == True
all_fire = sf_sal[sf_mask]
sum_of_base_salary = all_fire['BasePay'].sum()
print(sum_of_base_salary/ratio_of_police)

By looking at the comments another way would've been to use groupby statements

2 Answers 2

1

You can try:

is_police_or_fire = df['JobTitle'].str.extract(r'(FIRE|POLICE)', expand=False)
out = df.groupby(is_police_or_fire)['BasePay'].mean()
Sign up to request clarification or add additional context in comments.

4 Comments

quick question following your suggestion for example once you do, if you followed that with the drop command would it drop those rows df.groupby(is_police_or_fire)
What do you mean? Do you want to drop all rows in is_police_or_fire?
Yeah say for example if i wanted to do that instead can i follow that group by statement with a drop statement?
No, you don't. IIUC, you should do something like df.drop(df[is_police_or_fire.notna()].index) or better df[is_police_or_fire.isna()]?
0

You are looking for groupby operation.

1 Comment

I've had a look at groupby and it confused me so much, can you give a basic example, what i did try instead was creating a mask then making a new dataframe, then since that new dataframe only contains true, i can then sum the base_salary column

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.