I have the following dataframe for which I want to create a column named 'Value' using numpy for fast looping and at the same time refer to the previous row value in the same column.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"Product": ["A", "A", "A", "A", "B", "B", "B", "C", "C"],
"Inbound": [115, 220, 200, 402, 313, 434, 321, 343, 120],
"Outbound": [10, 20, 24, 52, 40, 12, 43, 23, 16],
"Is First?": ["Yes", "No", "No", "No", "Yes", "No", "No", "Yes", "No"],
}
)
Product Inbound Outbound Is First? Value
0 A 115 10 Yes 125
1 A 220 20 No 105
2 A 200 24 No 81
3 A 402 52 No 29
4 B 313 40 Yes 353
5 B 434 12 No 341
6 B 321 43 No 298
7 C 343 23 Yes 366
8 C 120 16 No 350
The formula for Value column in pseudocode is:
if ['Is First?'] = 'Yes' then [Value] = [Inbound] + [Outbound]
else [Value] = [Previous Value] - [Outbound]
The ideal way of creating the Value column right now is to do a for loop and use shift to refer to the previous column (which I am somehow not able to make work). But since I will be applying this over a giant dataset, I want to use the numpy vectorization method on it.
for i in range(len(df)):
if df.loc[i, "Is First?"] == "Yes":
df.loc[i, "Value"] = df.loc[i, "Inbound"] + df.loc[i, "Outbound"]
else:
df.loc[i, "Value"] = df.loc[i, "Value"].shift(-1) + df.loc[i, "Outbound"]