IF ELSE using Numpy and Pandas

Question

After searching several forums on similar questions, it appears that one way to iterate a conditional statement quickly is using Numpy's np.where() function on Pandas. I am having trouble with the following task:

I have a dataset that looks like several rows of:

PatientID    Date1      Date2       ICD
1234         12/14/10   12/12/10    313.2, 414.2, 228.1
3213         8/2/10     9/5/12      232.1, 221.0

I am trying to create a conditional statement such that:

 1. if strings '313.2' or '414.2' exist in df['ICD'] return 1
 2. if strings '313.2' or '414.2' exist in df['ICD'] and Date1>Date2 return 2
 3. Else return 0

Given that Date1 and Date2 are in date-time format and my data frame is coded as df, I have the following code:

df['NewColumn'] = np.where(df.ICD.str.contains('313.2|414.2').astype(int), 1, np.where(((df.ICD.str.contains('313.2|414.2').astype(int))&(df['Date1']>df['Date2'])), 2, 0)

However this code only returns a series with 1's and 0's and does not include a 2. How else can I complete this task?

EdChum · Accepted Answer · 2016-01-05 15:31:47Z

1

You almost had it, you needed to pass a raw string (prepend with r) to contains so it treats it as a regex:

In [115]:
df['NewColumn'] = np.where(df.ICD.str.contains(r'313.2|414.2').astype(int), 1, np.where(((df.ICD.str.contains(r'313.2|414.2').astype(int))&(df['Date1']>df['Date2'])), 2, 0))
df

Out[115]:
   PatientID      Date1      Date2                ICD  NewColumn
0       1234 2010-12-14 2010-12-12  313.2,414.2,228.1          1
1       3213 2010-08-02 2012-09-05        232.1,221.0          0

You get 1 returned because it short circuits on the first condition because that is met, if you want to get 2 returned then you need to rearrange the order of evaluation:

In [122]:
df['NewColumn'] = np.where( (df.ICD.str.contains(r'313.2|414.2').astype(int)) & ( df['Date1'] > df['Date2'] ), 2 , 
                           np.where( df.ICD.str.contains(r'313.2|414.2').astype(int), 1, 0 ) )
df

Out[122]:
   PatientID      Date1      Date2                ICD  NewColumn
0       1234 2010-12-14 2010-12-12  313.2,414.2,228.1          2
1       3213 2010-08-02 2012-09-05        232.1,221.0          0

edited Jan 5, 2016 at 15:31

answered Jan 5, 2016 at 15:14

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AMS Over a year ago

Thank you for the quick response! One additional problem I am having is that the array should return a '2' for row 0 instead of a '1' according to the conditional statement 'if strings '313.2' or '414.2' exist in df['ICD'] and Date1>Date2 return 2'. I tried to pass this as my third argument in the np.where() but it doesn't seem to catch?

EdChum Over a year ago

but you've nested your conditional statements in that order so it short circuits because the first condition is met, you'd have to rearrange the order if you want it to return 2 in that case

Chris · Accepted Answer · 2016-01-05 15:53:29Z

It is much easier to use the pandas functionality itself. Using numpy to do something that pandas already does is a good way to get unexpected behaviour.

Assuming you want to check for a cell value containing 313.2 only (so 2313.25 returns False).

df['ICD'].astype(str) == '313.2'

returns a Series Object of True or False next to each index entry.

so

 boolean =(df['ICD'].astype(str) == '313.2')| (df['ICD'].astype(str) == '414.2')
if(boolean.any()):
    #do something
    return 1

 boolean2 =((df['ICD'].astype(str) == '313.2')| (df['ICD'].astype(str) == '414.2'))&(df['Date1']>df['Date2'])
if(boolean2.any()):
     return 2

etc

Pandas also has the function isin() which can simplify things further.

The docs are here: http://pandas.pydata.org/pandas-docs/stable/indexing.html

Also, you do not return two because of the order you evaluate the conditional statements.In any circumstance where condition 2 evaluates as true, condition 1 must evaluate to be true also. So as you test condition 1 too, it always returns 1 or passes.

In short, you need to test condition 2 first, as there is no circumstance where 1 can be false and 2 can be true.

Collectives™ on Stack Overflow

IF ELSE using Numpy and Pandas

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related