Optimisation of your code
Keeping the exact same idea, including iteration, your code can be vectorized a bit, loosing at least the inner for loop
def classOpt(df, profit_pct=1.01, loss_pct=0.99):
vals=df.close.values
res=[]
for i in range(len(df)):
win=vals[i]*profit_pct
loss=vals[i]*loss_pct
futw=np.argmax(vals[i:]>=win)
futl=np.argmax(vals[i:]<=loss)
if (futw>0) and (futl==0 or futl>futw):
res.append(1)
elif (futl>0) and (futw==0 or futw>futl):
res.append(-1)
else:
res.append(0)
df['opt']=res
The idea is, at each stage, to at least work "vectorized" way on the array of future values. So at stage i, on vals[i:].
We get a bool array saying which future value is a win vals[i:]>=win.
And which one is a loss vals[i:]<=loss.
With np.argmax we can easily get when this win or loss will occur, if it occurs.
np.argmax(vals[i:]>=win).
Note that since we included column i in the future values (as a sentinel in fact), we know that first boolean has to be False. So if that np.argmax(vals[i:]>=win) is 0, that means that there is no future win to come. If it is non 0, it is the number of days when the first future win will occur.
Likewise for future loss
So, result is 1, if a futw is non zero, and futl is either 0 or bigger than futw. That is, if there is a win to come, and either no loss, or a loss further in the future than the win to come (again, I find it a strange rule, but that is the one of your code)
The symmetric situation is a -1. Else 0.
Sliding window method
(Note: it is the fourth time in only a week or two, that I use this function in SO questions. A bit of recycling :-). In selected answer, btw, so, so far, it was really efficient. I fear this time, if mozway manage to correct result differences, that it will not go as well).
This method is based on the np.lib.stride_tricks.sliding_window_view function.
If M is [1,2,3,10,20,30,40], then sliding_window_view(M, (3,)) is
[[1, 2, 3],
[2, 3, 10],
[3, 10, 20],
[10, 20, 30],
[20, 30, 40]]
I think you see how it can be useful for computing with future values.
And one beauty of it, is that it is just a view. So no memory is really allocated for this (potentially huge otherwise) array.
In your case, because we want all future values, we need len(df) columns.
And since we want that even for the last line, we need first to fill the values with some NaN. len(df)-1 NaN precisely, so that the last line can have exactly as much (void) predictions as the first line.
Then we have a len(df)×len(df) view. With the first column being the actual values. And each other columns being future values, at D+1, D+2, ...
From there, we just have to do the exact same thing as before, with argmax(...>win).
Here is the code
def slide(df, profit_pct=1.01, loss_pct=0.99):
n=len(df)
valswithnan=np.concatenate([df.close.values, [np.nan]*(n-1)])
view=np.lib.stride_tricks.sliding_window_view(valswithnan, (n,))
win=(view[:,0]*profit_pct).reshape(-1,1) # Column of win
loss=(view[:,0]*loss_pct).reshape(-1,1) # of loss
futw=np.argmax(view>=win, axis=1) # For each line, index of future win or 0
futl=np.argmax(view<=loss, axis=1)
res=(futw>0)*1 # res is 1 where there is a future win
res[(futl>0) & ((futw>futl) | (futw==0))]=-1 # unless a future loss exists sooner
df['slide']=res
Experimental setup
def gen():
# Something that looks like random variations. With equal opportunities to win/lose...
return pd.DataFrame({'close':100+np.cumsum(np.random.normal(0, 1, (10000,)))})
df=gen()
# Verify column differences between all 4 methods
# 'long_opportunies' for yours
# opt for my 1st version
# slide for my 2nd version with sliding_window_view
# cuminmax for mozway's (but check fails for it. Pity, since timings rock)
def check():
df=gen()
classify_long_opportunities(df)
classOpt(df)
slide(df)
cuminmax(df)
return ((df['long_opportunities']-df.opt)**2).sum(), ((df.opt-df.slide)**2).sum(), ((df.opt-df.cmm)**2).sum()
Ran dozens of check. All 3 methods (yours, and the 2 mine) give always the exact same result.
But timings...
Timings
| Method |
Timing |
| Your method |
14660 ms |
| My 1st |
240 ms |
| My 2nd |
152 ms |
| Mozway |
40 ms |
Note that sliding_window_view is not that impressive on this problem. I mean way less that the 3000× gain it gave in my previous usage on other problems. This has probably to do with lot of useless computation it does (a triangle of half of the view is full of NaNs). Yet, it is still the fastest. Mozway's method is way faster, but result differs so far.