0

I have the following df,

days    days_1    days_2    period    percent_1   percent_2    amount
3       5         4         1         0.2         0.1         100
2       1         3         4         0.3         0.1         500
9       8         10        6         0.4         0.2         600
10      7         8         11        0.5         0.3         700
10      5         6         7         0.7         0.4         800 

I have the following logic that applies to each row of the df,

for each row in df:
    if days < days_1:
        amount_missed = 0
        days_missed = 0
    elif days_1 < days < days_2:
        missed_percent = percent_1 - percent_2
        amount_missed = amount * (missed_percent / 100)
        days_missed = days - days_1    
    elif days_2 < days < period or days > period:    
        missed_percent = percent_2
        amount_missed = amount * (missed_percent / 100)
        days_missed = days - days_2
    else:
        amount_missed = 0
        days_missed = 0 

I am trying to use boolean mask and np.where to translate the above logic as follows,

cond1 = df['days_2'] < df['days']
cond2 = df['days'] < df['period']
cond3 = df['days'] > df['period']
cond4 = df['days'] >= df['days_1']
cond5 = df['days'] < df['days_2']
cond6 = df['days'] > df['days_1']

mask = ((cond1 & cond2) | cond3) & cond4
mask2 = cond5 & cond6

df['amount_missed'] = np.where(mask, df['amount'] * df['percent_2'] / 100, 0.0)
df['amount_missed'] = np.where(mask2, df['amount'] * (df['percent_1'] - df['percent_2']) / 100, 0.0)

df['days_missed'] = np.where(mask, df['days'] - df['days_2'], 0)
df['days_missed'] = np.where(mask2, df['days'] -df['days_1'], 0)

but the result of above code is not the same as the row iteration one, which should be,

{
 'amount_missed': {0: 0.0, 1: 1.0, 2: 1.2, 3: 2.1, 4: 3.2},
 'days_missed': {0: 0, 1: 1, 2: 1, 3: 2, 4: 4}
 }  

the boolean mask one generates the following result,

{
 'amount_missed': {0: 0.0, 1: 0.9999999999999999, 2: 1.2, 3: 0.0, 4: 0.0},
 'days_missed': {0: 0, 1: 1, 2: 1, 3: 0, 4: 0}
 }

I am wondering how to fix it, and maybe there are other ways to replace df row iteration here.

5
  • The code you provided with the explicit loop also does not provide the output which you say it should provide. I assume the 7th line should be changed to set the value of 'amount_missed' instead of 'amount', but even then results are still different Commented Jan 23, 2018 at 11:18
  • @DennisSoemers sry, I have corrected my op Commented Jan 23, 2018 at 11:24
  • Simplify! Show us just ONE output array that differs, with the code for just that one, and let's debug that. No need to show us 9 different arrays, some of which have no errors. Commented Jan 23, 2018 at 12:05
  • @JohnZwinck have much simplified the op, is it okay now? Commented Jan 23, 2018 at 12:17
  • @daiyue: That's better. Thank you. Commented Jan 23, 2018 at 12:23

2 Answers 2

2

The root cause of the bug is overwriting the target variables each time with a new np.where(), rather than cascading the where() expressions. But better than cascading where() expressions is np.select():

c0 = df.days < df.days_1
c1 = (df.days_1 < df.days) & (df.days < df.days_2)
c2 = ((df.days_2 < df.days) & (df.days < df.period)) | (df.days > df.period)

df['days_missed'] = np.select([c0, c1, c2], [0, df.days - df.days_1, df.days - df.days_2])
Sign up to request clarification or add additional context in comments.

Comments

2

Code used to generate the original dataframe (from the original, unedited question):

df = pd.DataFrame({
    'days': [3, 2, 9, 10, 10],
    'days_1': [5, 1, 8, 7, 5],
    'days_2': [4, 3, 10, 8, 6],
    'period': [1, 4, 6, 11, 7],
    'percent_1': [0.2, 0.3, 0.4, 0.5, 0.7],
    'percent_2': [0.1, 0.1, 0.2, 0.3, 0.4],
    'amount': [100, 500, 600, 700, 800]
}, columns=['days', 'days_1', 'days_2', 'period', 'percent_1', 'percent_2', 'amount'])

The following code provides the results you wanted in your original question (not updated for the simplified case you created after being asked to do so in comments):

df['amount_missed'] = np.where((df['days_1'] < df['days']) & (df['days'] < df['days_2']),
                               df['amount'] * (df['percent_1'] - df['percent_2']) / 100,
                               np.where((df['days_2'] < df['days']) & (df['days'] < df['period']),
                                        df['amount'] * (df['percent_2']) / 100,
                                        0.0))

df['days_missed'] = np.where((df['days_1'] < df['days']) & (df['days'] < df['days_2']),
                             df['days'] - df['days_1'],
                             np.where((df['days_2'] < df['days']) & (df['days'] < df['period']),
                                      df['days'] - df['days_2'],
                                      0))

Output:

   days  days_1  days_2  period  percent_1  percent_2  amount  amount_missed  \
0     3       5       4       1        0.2        0.1     100            0.0   
1     2       1       3       4        0.3        0.1     500            1.0   
2     9       8      10       6        0.4        0.2     600            1.2   
3    10       7       8      11        0.5        0.3     700            2.1   
4    10       5       6       7        0.7        0.4     800            0.0   

   days_missed  
0            0  
1            1  
2            1  
3            2  
4            0  

EDIT:

Same answer with numpy.select:

m1 = (df['days_1'] < df['days']) & (df['days'] < df['days_2'])
s1 = df['amount'] * (df['percent_1'] - df['percent_2']) / 100
s11 = df['days'] - df['days_1']

m2 = (df['days_2'] < df['days']) & (df['days'] < df['period'])
s2 = df['amount'] * (df['percent_2']) / 100
s22 = df['days'] - df['days_2']

df['amount_missed'] = np.select([m1, m2], [s1, s2], default=0)
df['days_missed'] =   np.select([m1, m2], [s11, s22], default=0)

4 Comments

hmmm, np.select here should be nicer ;)
@jezrael I agree this may not necessarily be the cleanest solution. I partially decided to figure out how to answer the question because I was interested in learning how to work with np.where myself, but would definitely also be interested in cleaner solutions! I edited the code used to generate the original dataframe into my answer, maybe that'll be useful if you decide to also write an answer with a potentially cleaner solution
Can I rewrite your solution and add it to your answer? :)
@jezrael Sure. I see John also just posted a solution with np.select already though

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.