Applying lambda functions with updated values

Question

Let us assume we are given the below function:

def f(x,y):
    y = x + y
    return y

The function f(x,y) sums two numbers (but it could be any more or less complicated functions of two arguments). Let us now consider the following

import pandas as pd
import random
import numpy as np

random.seed(1234)
df = pd.DataFrame({'first': random.sample(range(0, 9), 5),
                   'second': np.NaN}, index = None)
y = 1

df
   first  second
0      7     NaN
1      1     NaN
2      0     NaN
3      6     NaN
4      4     NaN

for the scope of the question the second column of the data frame is here irrelevant, so we can without loss of generality assume it to be NaN. Let us apply f(x,y) to each row of the data frame, considering that the variable y has been initialised to 1. The first iteration returns 7+1 = 8; now, when applying the function again to second row, we want the y value to be updated to the previously calculated 8 and therefore the final result to be 1+8 =9, and so on and so forth.

What is the pythonic way to handle this? I want to avoid looping and re-assigning the variables inside the loop, thus my guess would be something along the lines of

def apply_to_df(df, y):
    result = df['first'].apply(lambda s: f(s,y))
    return result

however one may easily see that the above does not consider the updated values, whereas it computes the all calculations with the initial original value for y=1.

print(apply_to_df(df,y))
0    8
1    2
2    1
3    7
4    5

What you're describing is a recurrence relation. There is no built-in way to do this in pandas, although it's been discussed and there is an open (but old) issue about it. Right now, you have to loop in order to do it. — BrenBarn
– BrenBarn, Commented Jun 22, 2017 at 18:42
In this case, it appears to be: df['first'].cumsum() + y0 where y0 is the initial seed value for y (1 in this example). — Alexander
– Alexander, Commented Jun 22, 2017 at 18:49
The example may be a little unfortunate as it can actually be solve using a cumulative sum; however, the actual question regards any more or less complicated function of two (or any) variables. — gented
– gented, Commented Jun 22, 2017 at 19:19

juanpa.arrivillaga · Accepted Answer · 2017-06-22 18:50:50Z

2

Note, you can probably solve this specific case with an existing cumulative function. However, in the general case, you could just hack it by relying on global state:

In [7]: y = 1

In [8]: def f(x):
   ...:     global y
   ...:     y = x + y
   ...:     return y
   ...:

In [9]: df['first'].apply(lambda s: f(s))
Out[9]:
0     8
1     9
2     9
3    15
4    19
Name: first, dtype: int64

I want to avoid looping and re-assigning the variables inside the loop

Note, pd.DataFrame.apply is a vanilla Python loop under the hood, and it's actually less efficient because it does a lot of checking/validation of inputs. It is not meant to be efficient, but convenient. So if you care about performance, you've already given up if you are relying on .apply

Honestly, I think I would rather write the explicit loop over the rows inside of a function than rely on global state.

answered Jun 22, 2017 at 18:50

juanpa.arrivillaga

97.6k14 gold badges141 silver badges190 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

gented Over a year ago

The trick of the global variable actually works, although as you mentioned I would then just consider make the whole loop just more explicit.

Alexander · Accepted Answer · 2017-06-22 19:44:20Z

0

You could use a generator function to remember the prior calculation result:

def my_generator(series, foo, y_seed=0):
    y = y_seed  # Seed value for `y`.
    s = series.__iter__()  # Create an iterator on the series.
    while True:
        # Call the function on the next `x` value together with the most recent `y` value.
        y = foo(x=s.next(), y=y)   
        yield y

df = df.assign(new_col=list(my_generator(series=df['first'], foo=f, y_seed=1)))
>>> df
   first  second  new_col
0      8     NaN        9
1      3     NaN       12
2      0     NaN       12
3      5     NaN       17
4      4     NaN       21

edited Jun 22, 2017 at 19:44

answered Jun 22, 2017 at 19:31

Alexander

111k32 gold badges212 silver badges208 bronze badges

Collectives™ on Stack Overflow

Applying lambda functions with updated values

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related