1

I have the following dataframe for which I want to create a column named 'Value' using numpy for fast looping and at the same time refer to the previous row value in the same column.

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "Product": ["A", "A", "A", "A", "B", "B", "B", "C", "C"],
        "Inbound": [115, 220, 200, 402, 313, 434, 321, 343, 120],
        "Outbound": [10, 20, 24, 52, 40, 12, 43, 23, 16],
        "Is First?": ["Yes", "No", "No", "No", "Yes", "No", "No", "Yes", "No"],
    }
)
  Product  Inbound  Outbound Is First?  Value
0       A      115        10       Yes    125
1       A      220        20        No    105
2       A      200        24        No     81
3       A      402        52        No     29
4       B      313        40       Yes    353
5       B      434        12        No    341
6       B      321        43        No    298
7       C      343        23       Yes    366
8       C      120        16        No    350

The formula for Value column in pseudocode is:

if ['Is First?'] = 'Yes' then [Value] = [Inbound] + [Outbound]
else [Value] = [Previous Value] - [Outbound]

The ideal way of creating the Value column right now is to do a for loop and use shift to refer to the previous column (which I am somehow not able to make work). But since I will be applying this over a giant dataset, I want to use the numpy vectorization method on it.

for i in range(len(df)):
    if df.loc[i, "Is First?"] == "Yes":
        df.loc[i, "Value"] = df.loc[i, "Inbound"] + df.loc[i, "Outbound"]
    else:
        df.loc[i, "Value"] = df.loc[i, "Value"].shift(-1) + df.loc[i, "Outbound"]
1
  • does a "yes" always go together with an other product name? Commented Aug 30, 2019 at 23:13

4 Answers 4

2

One way:
You may use np.subtract.accumulate with transform

s = df['Is First?'].eq('Yes').cumsum()
df['value'] = ((df.Inbound + df.Outbound).where(df['Is First?'].eq('Yes'), df.Outbound)
                                         .groupby(s)
                                         .transform(np.subtract.accumulate))

Out[1749]:
  Product  Inbound  Outbound Is First?  value
0       A      115        10       Yes    125
1       A      220        20        No    105
2       A      200        24        No     81
3       A      402        52        No     29
4       B      313        40       Yes    353
5       B      434        12        No    341
6       B      321        43        No    298
7       C      343        23       Yes    366
8       C      120        16        No    350

Another way:
Assign value for Yes. Create groupid s to use for groupby. Groupby and shift Outbound to calculate cumsum, and subtract it from 'Yes' value of each group. Finally, use it to fillna.

df['value'] = (df.Inbound + df.Outbound).where(df['Is First?'].eq('Yes'))
s = df['Is First?'].eq('Yes').cumsum()
s1 = df.value.ffill() - df.Outbound.shift(-1).groupby(s).cumsum().shift()
df['value'] = df.value.fillna(s1)

Out[1671]:
  Product  Inbound  Outbound Is First?  value
0       A      115        10       Yes  125.0
1       A      220        20        No  105.0
2       A      200        24        No   81.0
3       A      402        52        No   29.0
4       B      313        40       Yes  353.0
5       B      434        12        No  341.0
6       B      321        43        No  298.0
7       C      343        23       Yes  366.0
8       C      120        16        No  350.0
Sign up to request clarification or add additional context in comments.

2 Comments

Could make it even short if a new product always corresponds to a Yes: df.loc[df['Is First?'].eq('Yes'),'Value'] = df.Inbound + df.Outbound df.loc[df['Is First?'].eq('No'), 'Value'] = df.Value.ffill()-df.Outbound.shift(-1).groupby(df.Product).cumsum().shift()
ah, I see what you mean. I did consider df.Product for groupby. However, I decided against it because OP's logic never says about it. His logic solely mentions only values of 'Is First?', so I have to create s to use for groupby.
1

This is not a trivial task, the difficulty lies in the consecutive Nos. It's necessary to group consecutive no's together, the code below should do,

col_sum = df.Inbound+df.Outbound

mask_no = df['Is First?'].eq('No')

mask_yes = df['Is First?'].eq('Yes')

consec_no = mask_yes.cumsum()

result = col_sum.groupby(consec_no).transform('first')-df['Outbound'].where(mask_no,0).groupby(consec_no).cumsum()

Comments

1

Use:

df.loc[df['Is First?'].eq('Yes'),'Value']=df['Inbound']+df['Outbound']
df.loc[~df['Is First?'].eq('Yes'),'Value']=df['Value'].fillna(0).shift().cumsum()-df.loc[~df['Is First?'].eq('Yes'),'Outbound'].cumsum()

1 Comment

This is wrong because the 1st cumsum is calculated across 'Yes' groups.
1

Annotated numpy code:

## 1. line up values to sum

ob = -df["Outbound"].values
# get yes indices
fi, = np.where(df["Is First?"].values == "Yes")
# insert yes formula at yes positions
ob[fi] = df["Inbound"].values[fi] - ob[fi]

## 2. calculate block sums and subtract each from the
## first element of the **next** block

ob[fi[1:]] -= np.add.reduceat(ob,fi)[:-1]
# now simply taking the cumsum will reset after each block
df["Value"] = ob.cumsum()

Result:

  Product  Inbound  Outbound Is First?  Value
0       A      115        10       Yes    125
1       A      220        20        No    105
2       A      200        24        No     81
3       A      402        52        No     29
4       B      313        40       Yes    353
5       B      434        12        No    341
6       B      321        43        No    298
7       C      343        23       Yes    366
8       C      120        16        No    350

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.