Better way of creating Pandas Dataframe based on condition

Question

I have a task to create Dataframes based on conditions within other Dataframes.

I've been doing it the same way for about a week now, but I was curious if there was a better way. I stumbled across This Example. Now i know the example he is using is creating a separate column based on conditions, but it made me wonder if my code could be improved.

Here is a shortened version of the code in link for ease of use:

import pandas as pd
import numpy as np

raw_data = {'student_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
        'test_score': [76, 88, 84, 67, 53, 96, 64, 91, 77, 73, 52, np.NaN]}
df = pd.DataFrame(raw_data, columns = ['student_name', 'test_score'])

print(df)

grades = []

for row in df['test_score']:
    if row > 59:
        grades.append('Pass')
    else:
        grades.append('fail')
df['grades'] = grades
print(df)

   student_name  test_score grades
0        Miller        76.0   Pass
1      Jacobson        88.0   Pass
2           Ali        84.0   Pass
3        Milner        67.0   Pass
4         Cooze        53.0   fail
5         Jacon        96.0   Pass
6        Ryaner        64.0   Pass
7          Sone        91.0   Pass
8         Sloan        77.0   Pass
9         Piger        73.0   Pass
10        Riani        52.0   fail
11          Ali         NaN   fail

Going along with the above example, if i did not want to make a "Grades" Column, but instead wanted a dataframe of all the people who passed. I personally would do this:

pass_df = df[df['test_score'] > 59]
print(pass_df)

Is there a better way of doing this?

miradulo · Accepted Answer · 2017-02-23 16:55:16Z

3

The new column can be assigned more nicely using np.where.

df['grades'] = np.where(df.test_score > 59, 'Pass', 'fail')

As for indexing where the test score is greater than 59 your approach is standard, however should you intend on treating the result as its own DataFrame you will want to call .copy().

Demo

>>> df['grades'] = np.where(df.test_score > 59, 'Pass', 'fail')

>>> df
   student_name  test_score grades
0        Miller        76.0   Pass
1      Jacobson        88.0   Pass
2           Ali        84.0   Pass
3        Milner        67.0   Pass
4         Cooze        53.0   fail
5         Jacon        96.0   Pass
6        Ryaner        64.0   Pass
7          Sone        91.0   Pass
8         Sloan        77.0   Pass
9         Piger        73.0   Pass
10        Riani        52.0   fail
11          Ali         NaN   fail

edited Feb 23, 2017 at 16:55

answered Feb 23, 2017 at 16:27

miradulo

29.8k7 gold badges86 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

MattR Over a year ago

I don't see it now, but your point of of using .copy() was very helpful as well. Thanks for clearing this up for me

miradulo Over a year ago

@MattR Yeah I didn't feel it was explicitly necessary, but I had the feeling you might be trying to modify the result... glad I could help!

Collectives™ on Stack Overflow

Better way of creating Pandas Dataframe based on condition

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related