Pandas apply based on conditional from another column

Question

I'm looking to adjust values of one column based on a conditional in another column.

I'm using np.busday_count, but I don't want the weekend values to behave like a Monday (Sat to Tues is given 1 working day, I'd like that to be 2)

dispdf = df[(df.dispatched_at.isnull()==False) & (df.sold_at.isnull()==False)]

dispdf["dispatch_working_days"] = np.busday_count(dispdf.sold_at.tolist(), dispdf.dispatched_at.tolist())

for i in range(len(dispdf)):
    if dispdf.dayofweek.iloc[i] == 5 or dispdf.dayofweek.iloc[i] == 6:
        dispdf.dispatch_working_days.iloc[i] +=1

Sample:

            dayofweek   dispatch_working_days
    43159   1.0 3
    48144   3.0 3
    45251   6.0 1
    49193   3.0 0
    42470   3.0 1
    47874   6.0 1
    44500   3.0 1
    43031   6.0 3
    43193   0.0 4
    43591   6.0 3

Expected Results:

        dayofweek   dispatch_working_days
43159   1.0 3
48144   3.0 3
45251   6.0 2
49193   3.0 0
42470   3.0 1
47874   6.0 2
44500   3.0 1
43031   6.0 2
43193   0.0 4
43591   6.0 4

At the moment I'm using this for loop to add a working day to Saturday and Sunday values. It's slow!

Can I use a vectorization instead to speed this up. I tried using .apply but to no avail.

yep, added it in. Basically, any row of dayofweek that equals 5 or 6 needs to increase the value of dispatch_working_days by +1 — Leon Kyriacou
– Leon Kyriacou, Commented Feb 19, 2018 at 15:47

joaoavf · Accepted Answer · 2018-02-19 15:51:00Z

3

Pretty sure this works, but there are more optimized implementations:

def adjust_dispatch(df_line):
    if df_line['dayofweek'] >= 5:
        return df_line['dispatch_working_days'] + 1
    else:
        return df_line['dispatch_working_days']         

df['dispatch_working_days'] = df.apply(adjust_dispatch, axis=1)

answered Feb 19, 2018 at 15:51

joaoavf

1,3932 gold badges13 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ilia timofeev · Accepted Answer · 2018-02-19 19:42:35Z

2

for in you code could be replaced by that line:

dispdf.loc[dispdf.dayofweek>5,'dispatch_working_days']+=1

or you could use numpy.where

https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

answered Feb 19, 2018 at 19:42

ilia timofeev

1,1197 silver badges15 bronze badges

1 Comment

ilia timofeev Over a year ago

I think this lines could give you some speedup: dispdf = df.dropna(subset=['dispatched_at','sold_at'])

dispdf["dispatch_working_days"] = np.busday_count(dispdf.sold_at.values.astype('datetime64[D]'),dispdf.dispatched_at.values.astype('datetime64[D]'))

Collectives™ on Stack Overflow

Pandas apply based on conditional from another column

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related