2

I have a task to create Dataframes based on conditions within other Dataframes.

I've been doing it the same way for about a week now, but I was curious if there was a better way. I stumbled across This Example. Now i know the example he is using is creating a separate column based on conditions, but it made me wonder if my code could be improved.

Here is a shortened version of the code in link for ease of use:

import pandas as pd
import numpy as np

raw_data = {'student_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
        'test_score': [76, 88, 84, 67, 53, 96, 64, 91, 77, 73, 52, np.NaN]}
df = pd.DataFrame(raw_data, columns = ['student_name', 'test_score'])

print(df)

grades = []

for row in df['test_score']:
    if row > 59:
        grades.append('Pass')
    else:
        grades.append('fail')
df['grades'] = grades
print(df)

   student_name  test_score grades
0        Miller        76.0   Pass
1      Jacobson        88.0   Pass
2           Ali        84.0   Pass
3        Milner        67.0   Pass
4         Cooze        53.0   fail
5         Jacon        96.0   Pass
6        Ryaner        64.0   Pass
7          Sone        91.0   Pass
8         Sloan        77.0   Pass
9         Piger        73.0   Pass
10        Riani        52.0   fail
11          Ali         NaN   fail

Going along with the above example, if i did not want to make a "Grades" Column, but instead wanted a dataframe of all the people who passed. I personally would do this:

pass_df = df[df['test_score'] > 59]
print(pass_df)

Is there a better way of doing this?

0

1 Answer 1

3

The new column can be assigned more nicely using np.where.

df['grades'] = np.where(df.test_score > 59, 'Pass', 'fail')

As for indexing where the test score is greater than 59 your approach is standard, however should you intend on treating the result as its own DataFrame you will want to call .copy().

Demo

>>> df['grades'] = np.where(df.test_score > 59, 'Pass', 'fail')

>>> df
   student_name  test_score grades
0        Miller        76.0   Pass
1      Jacobson        88.0   Pass
2           Ali        84.0   Pass
3        Milner        67.0   Pass
4         Cooze        53.0   fail
5         Jacon        96.0   Pass
6        Ryaner        64.0   Pass
7          Sone        91.0   Pass
8         Sloan        77.0   Pass
9         Piger        73.0   Pass
10        Riani        52.0   fail
11          Ali         NaN   fail
Sign up to request clarification or add additional context in comments.

2 Comments

I don't see it now, but your point of of using .copy() was very helpful as well. Thanks for clearing this up for me
@MattR Yeah I didn't feel it was explicitly necessary, but I had the feeling you might be trying to modify the result... glad I could help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.