4

I'm looking to adjust values of one column based on a conditional in another column.

I'm using np.busday_count, but I don't want the weekend values to behave like a Monday (Sat to Tues is given 1 working day, I'd like that to be 2)

dispdf = df[(df.dispatched_at.isnull()==False) & (df.sold_at.isnull()==False)]

dispdf["dispatch_working_days"] = np.busday_count(dispdf.sold_at.tolist(), dispdf.dispatched_at.tolist())

for i in range(len(dispdf)):
    if dispdf.dayofweek.iloc[i] == 5 or dispdf.dayofweek.iloc[i] == 6:
        dispdf.dispatch_working_days.iloc[i] +=1

Sample:

            dayofweek   dispatch_working_days
    43159   1.0 3
    48144   3.0 3
    45251   6.0 1
    49193   3.0 0
    42470   3.0 1
    47874   6.0 1
    44500   3.0 1
    43031   6.0 3
    43193   0.0 4
    43591   6.0 3

Expected Results:

        dayofweek   dispatch_working_days
43159   1.0 3
48144   3.0 3
45251   6.0 2
49193   3.0 0
42470   3.0 1
47874   6.0 2
44500   3.0 1
43031   6.0 2
43193   0.0 4
43591   6.0 4

At the moment I'm using this for loop to add a working day to Saturday and Sunday values. It's slow!

Can I use a vectorization instead to speed this up. I tried using .apply but to no avail.

2
  • Could you post the results you want to see? Commented Feb 19, 2018 at 15:43
  • 1
    yep, added it in. Basically, any row of dayofweek that equals 5 or 6 needs to increase the value of dispatch_working_days by +1 Commented Feb 19, 2018 at 15:47

2 Answers 2

3

Pretty sure this works, but there are more optimized implementations:

def adjust_dispatch(df_line):
    if df_line['dayofweek'] >= 5:
        return df_line['dispatch_working_days'] + 1
    else:
        return df_line['dispatch_working_days']         

df['dispatch_working_days'] = df.apply(adjust_dispatch, axis=1)
Sign up to request clarification or add additional context in comments.

Comments

2

for in you code could be replaced by that line:

dispdf.loc[dispdf.dayofweek>5,'dispatch_working_days']+=1

or you could use numpy.where

https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

1 Comment

I think this lines could give you some speedup: dispdf = df.dropna(subset=['dispatched_at','sold_at']) dispdf["dispatch_working_days"] = np.busday_count(dispdf.sold_at.values.astype('datetime64[D]'),dispdf.dispatched_at.values.astype('datetime64[D]'))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.