I have a task to create Dataframes based on conditions within other Dataframes.
I've been doing it the same way for about a week now, but I was curious if there was a better way. I stumbled across This Example. Now i know the example he is using is creating a separate column based on conditions, but it made me wonder if my code could be improved.
Here is a shortened version of the code in link for ease of use:
import pandas as pd
import numpy as np
raw_data = {'student_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
'test_score': [76, 88, 84, 67, 53, 96, 64, 91, 77, 73, 52, np.NaN]}
df = pd.DataFrame(raw_data, columns = ['student_name', 'test_score'])
print(df)
grades = []
for row in df['test_score']:
if row > 59:
grades.append('Pass')
else:
grades.append('fail')
df['grades'] = grades
print(df)
student_name test_score grades
0 Miller 76.0 Pass
1 Jacobson 88.0 Pass
2 Ali 84.0 Pass
3 Milner 67.0 Pass
4 Cooze 53.0 fail
5 Jacon 96.0 Pass
6 Ryaner 64.0 Pass
7 Sone 91.0 Pass
8 Sloan 77.0 Pass
9 Piger 73.0 Pass
10 Riani 52.0 fail
11 Ali NaN fail
Going along with the above example, if i did not want to make a "Grades" Column, but instead wanted a dataframe of all the people who passed. I personally would do this:
pass_df = df[df['test_score'] > 59]
print(pass_df)
Is there a better way of doing this?