4

After searching several forums on similar questions, it appears that one way to iterate a conditional statement quickly is using Numpy's np.where() function on Pandas. I am having trouble with the following task:

I have a dataset that looks like several rows of:

PatientID    Date1      Date2       ICD
1234         12/14/10   12/12/10    313.2, 414.2, 228.1
3213         8/2/10     9/5/12      232.1, 221.0

I am trying to create a conditional statement such that:

 1. if strings '313.2' or '414.2' exist in df['ICD'] return 1
 2. if strings '313.2' or '414.2' exist in df['ICD'] and Date1>Date2 return 2
 3. Else return 0

Given that Date1 and Date2 are in date-time format and my data frame is coded as df, I have the following code:

df['NewColumn'] = np.where(df.ICD.str.contains('313.2|414.2').astype(int), 1, np.where(((df.ICD.str.contains('313.2|414.2').astype(int))&(df['Date1']>df['Date2'])), 2, 0)

However this code only returns a series with 1's and 0's and does not include a 2. How else can I complete this task?

2 Answers 2

1

You almost had it, you needed to pass a raw string (prepend with r) to contains so it treats it as a regex:

In [115]:
df['NewColumn'] = np.where(df.ICD.str.contains(r'313.2|414.2').astype(int), 1, np.where(((df.ICD.str.contains(r'313.2|414.2').astype(int))&(df['Date1']>df['Date2'])), 2, 0))
df

Out[115]:
   PatientID      Date1      Date2                ICD  NewColumn
0       1234 2010-12-14 2010-12-12  313.2,414.2,228.1          1
1       3213 2010-08-02 2012-09-05        232.1,221.0          0

You get 1 returned because it short circuits on the first condition because that is met, if you want to get 2 returned then you need to rearrange the order of evaluation:

In [122]:
df['NewColumn'] = np.where( (df.ICD.str.contains(r'313.2|414.2').astype(int)) & ( df['Date1'] > df['Date2'] ), 2 , 
                           np.where( df.ICD.str.contains(r'313.2|414.2').astype(int), 1, 0 ) )
df

Out[122]:
   PatientID      Date1      Date2                ICD  NewColumn
0       1234 2010-12-14 2010-12-12  313.2,414.2,228.1          2
1       3213 2010-08-02 2012-09-05        232.1,221.0          0
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the quick response! One additional problem I am having is that the array should return a '2' for row 0 instead of a '1' according to the conditional statement 'if strings '313.2' or '414.2' exist in df['ICD'] and Date1>Date2 return 2'. I tried to pass this as my third argument in the np.where() but it doesn't seem to catch?
but you've nested your conditional statements in that order so it short circuits because the first condition is met, you'd have to rearrange the order if you want it to return 2 in that case
0

It is much easier to use the pandas functionality itself. Using numpy to do something that pandas already does is a good way to get unexpected behaviour.

Assuming you want to check for a cell value containing 313.2 only (so 2313.25 returns False).

df['ICD'].astype(str) == '313.2'

returns a Series Object of True or False next to each index entry.

so

 boolean =(df['ICD'].astype(str) == '313.2')| (df['ICD'].astype(str) == '414.2')
if(boolean.any()):
    #do something
    return 1

 boolean2 =((df['ICD'].astype(str) == '313.2')| (df['ICD'].astype(str) == '414.2'))&(df['Date1']>df['Date2'])
if(boolean2.any()):
     return 2

etc

Pandas also has the function isin() which can simplify things further.

The docs are here: http://pandas.pydata.org/pandas-docs/stable/indexing.html

Also, you do not return two because of the order you evaluate the conditional statements.In any circumstance where condition 2 evaluates as true, condition 1 must evaluate to be true also. So as you test condition 1 too, it always returns 1 or passes.

In short, you need to test condition 2 first, as there is no circumstance where 1 can be false and 2 can be true.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.