Avoid for loop in Python DataFrame

Question

Problem 1.

Suppose I have n years of annual returns r and my initial wealth is 100. Every year I have fixed expense of 6. I want to create yearly wealth. I can do it in for loop. But for my purpose it's time consuming. How do I do it in DataFrame?

wealth = pd.Series(index = range(n+1))
wealth[0] = 100
for i in range(n):
    wealth.iloc[i+1] = wealth.iloc[i]*(1+r.iloc[i]) - 6

Initially I thought

wealth = ((1 + r - 0.06).cumprod()).multiply(other = 100)

to be the solution. But it is not. Expenses are not 6%. They are fixed. It is 6.

Problem 2.

I want to do the above N times. In each case I generate r by sampling n returns with replacement.

r = returnY.sample(n,replace=True).reset_index(drop=True)

Then for that return, create the wealth path I described above and create a n*N dateframe of wealth paths. I can do this in for loop, but for big N and n, it takes long time to run. Is there an efficient and elegant way to do this?

Problem 3.

Suppose allWealth is the DF with all wealth paths. Want to check %columns in each row less than 0. This is how I resolved it.

yy = allWealth.copy()
yy[yy>0] = 1
yy[yy<=0] = 0
yy.sum(axis = 1)/N

Any better, more elegant solution?

chi · Accepted Answer · 2021-03-31 00:53:57Z

1

Problem 1: It looks like you want to apply the "reduce" pattern. You can use reduce function from functools.

import numpy as np
from functools import reduce
rs = np.random.random(50)*0.3   #sequence of annual returns
result = reduce(lambda w,r: w*(1+r)-6, rs, 100)

If you want to keep all the intermediate values, use itertools.accumulate() instead. For example, replace the last line with the following:

ts_iter= itertools.accumulate(rs, lambda w,r: w*(1+r)-6, initial=100)
ts = list(ts_iter)     #itertools.accumulate returns an iterable

Problem 2: You can first generate a random matrix of nxN by sampling with replacement. Then you can use "apply_along_axis" method for each column.

import numpy as np
rm = np.random.random((n,N))
def sim(rs):
    return reduce(lambda w,r: w * (1+r) - 6, rs, 100)
result = np.apply_along_axis(sim, 0, rm)

Problem 3: you don't need to assign ones and zeros to your original dataframe. A mask dataframe of True and False implicitly acts as a dataframe of ones and zeros in this case.

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((50,30)))
mask = df < 0.5
mask.sum(axis=1)/30

edited Mar 31, 2021 at 0:53

answered Mar 30, 2021 at 3:24

chi

113 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

deb Over a year ago

Thanks for your answer. sim, defined above, gave me the end wealth. I need to track all time-series. So in your first example, you generated 50 returns. I need all 50 resulting wealth.

deb Over a year ago

TypeError Traceback (most recent call last) <ipython-input-22-429c23746661> in <module> ----> 1 ts_iter= accumulate(rm, lambda w,r: w*(1+r)-6, initial=100) 2 # list(ts_iter) 3 4 print(ts_iter) TypeError: accumulate() takes at most 2 arguments (3 given)

deb · Accepted Answer · 2021-04-09 20:46:09Z

0

I used @chi's solution with some small edit.

import numpy as np
import itertools

rm = np.random.random((n,N))   #sequence of annual returns
rm0 = np.insert(rm, 0, 100, axis=1)

def wealth(rs):
    return list(itertools.accumulate(rs, lambda w,r: w*(1+r)-6))

result = np.apply_along_axis(wealth, 1, rm0)

itertools.accumulate does not recognize initial. Hence inserted initial wealth at the front of return array.

answered Apr 9, 2021 at 20:46

deb

3511 gold badge5 silver badges15 bronze badges

Collectives™ on Stack Overflow

Avoid for loop in Python DataFrame

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related