Let us assume we are given the below function:
def f(x,y):
y = x + y
return y
The function f(x,y) sums two numbers (but it could be any more or less complicated functions of two arguments). Let us now consider the following
import pandas as pd
import random
import numpy as np
random.seed(1234)
df = pd.DataFrame({'first': random.sample(range(0, 9), 5),
'second': np.NaN}, index = None)
y = 1
df
first second
0 7 NaN
1 1 NaN
2 0 NaN
3 6 NaN
4 4 NaN
for the scope of the question the second column of the data frame is here irrelevant, so we can without loss of generality assume it to be NaN. Let us apply f(x,y) to each row of the data frame, considering that the variable y has been initialised to 1. The first iteration returns 7+1 = 8; now, when applying the function again to second row, we want the y value to be updated to the previously calculated 8 and therefore the final result to be 1+8 =9, and so on and so forth.
What is the pythonic way to handle this? I want to avoid looping and re-assigning the variables inside the loop, thus my guess would be something along the lines of
def apply_to_df(df, y):
result = df['first'].apply(lambda s: f(s,y))
return result
however one may easily see that the above does not consider the updated values, whereas it computes the all calculations with the initial original value for y=1.
print(apply_to_df(df,y))
0 8
1 2
2 1
3 7
4 5
df['first'].cumsum() + y0wherey0is the initial seed value for y (1 in this example).