23

I have a dataframe like below.

import pandas as pd
import numpy as np
raw_data = {'student':['A','B','C','D','E'],
        'score': [100, 96, 80, 105,156], 
    'height': [7, 4,9,5,3],
    'trigger1' : [84,95,15,78,16],
    'trigger2' : [99,110,30,93,31],
    'trigger3' : [114,125,45,108,46]}

df2 = pd.DataFrame(raw_data, columns = ['student','score', 'height','trigger1','trigger2','trigger3'])

print(df2)

I need to derive Flag column based on multiple conditions.

i need to compare score and height columns with trigger 1 -3 columns.

Flag Column:

  1. if Score greater than equal trigger 1 and height less than 8 then Red --

  2. if Score greater than equal trigger 2 and height less than 8 then Yellow --

  3. if Score greater than equal trigger 3 and height less than 8 then Orange --

  4. if height greater than 8 then leave it as blank

How to write if else conditions in pandas dataframe and derive columns?

Expected Output

  student  score  height  trigger1  trigger2  trigger3    Flag
0       A    100       7        84        99       114  Yellow
1       B     96       4        95       110       125     Red
2       C     80       9        15        30        45     NaN
3       D    105       5        78        93       108  Yellow
4       E    156       3        16        31        46  Orange

For other column Text1 in my original question I have tried this one but the integer columns not converting the string when concatenation using astype(str) any other approach?

def text_df(df):

    if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger1'].astype(str) + " and less than height 5"
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger2'].astype(str) + " and less than height 5"
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger3'].astype(str) + " and less than height 5"
    elif (df['height'] > 8):
        return np.nan
0

3 Answers 3

47

You need chained comparison using upper and lower bound

def flag_df(df):
    
    if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return 'Red'
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return 'Yellow'
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return 'Orange'
    elif (df['height'] > 8):
        return np.nan
    
df2['Flag'] = df2.apply(flag_df, axis = 1)

    student score   height  trigger1    trigger2    trigger3    Flag
0   A       100     7       84          99          114         Yellow
1   B       96      4       95          110         125         Red
2   C       80      9       15          30          45          NaN
3   D       105     5       78          93          108         Yellow
4   E       156     3       16          31          46          Orange

Note: You can do this with a very nested np.where but I prefer to apply a function for multiple if-else

Edit: answering @Cecilia's questions

  1. what is the returned object is not strings but some calculations, for example, for the first condition, we want to return df['height']*2

Not sure what you tried but you can return a derived value instead of string using

def flag_df(df):

    if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return df['height']*2
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return df['height']*3
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return df['height']*4
    elif (df['height'] > 8):
        return np.nan
  1. what if there are 'NaN' values in osome columns and I want to use df['xxx'] is None as a condition, the code seems like not working

Again not sure what code did you try but using pandas isnull would do the trick

def flag_df(df):

    if pd.isnull(df['height']):
        return df['height']
    elif (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return df['height']*2
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return df['height']*3
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return df['height']*4
    elif (df['height'] > 8):
        return np.nan
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the quick reply.. can you please help me to derive 2 more columns based on this condition. my original question was to derive 3 columns but edited because of confusing. can you able to see first post before editing?
Change your return statement to this: return '{} score {} greater than {} and less than height 5'.format(df['student'], df['score'], df['trigger1']). I have done it for first condition, use similar format for the rest of the conditions
Hi what is the returned object is not strings but some calculations, for example, for the first condition, we want to return df['height']*2?
and what if there'are 'NaN' values in osome columns and I want to use df['xxx'] is None as a condition, the code seems like not working
@Cecilia, please see the edits
|
18

Here is a way to use numpy.select() for doing this with neat code, scalable and faster:

conditions = [
    (df2['trigger1'] <= df2['score']) & (df2['score'] < df2['trigger2']) & (df2['height'] < 8),
    (df2['trigger2'] <= df2['score']) & (df2['score'] < df2['trigger3']) & (df2['height'] < 8),
    (df2['trigger3'] <= df2['score']) & (df2['height'] < 8),
    (df2['height'] > 8)
]

choices = ['Red','Yellow','Orange', np.nan]
df['Flag1'] = np.select(conditions, choices, default=np.nan)

Comments

3

you can use also apply with a custom function on axis 1 like this :

def color_selector(x):
    if (x['trigger1'] <= x['score'] < x['trigger2']) and (x['height'] < 8):
        return 'Red'
    elif (x['trigger2'] <= x['score'] < x['trigger3']) and (x['height'] < 8):
        return 'Yellow'
    elif (x['trigger3'] <= x['score']) and (x['height'] < 8):
        return 'Orange'
    elif (x['height'] > 8):
        return ''
df2 = df2.assign(flag=df2.apply(color_selector, axis=1))

you will get something like this : enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.