Conditional If Statement applied to multiple columns of dataframe

Question

I have a dataframe of minute stock returns and I would like to create a new column that is conditional on whether a return was exceeded (pos or negative), and if so that row is equal to the limit (pos or negative), otherwise equal to the last column that was checked. The example below illustrates this:

import pandas as pd
dict = [
        {'ticker':'jpm','date': '2016-11-28','returns1': 0.02,'returns2': 0.03,'limit': 0.1},
{ 'ticker':'ge','date': '2016-11-28','returns1': 0.2,'returns2': -0.3,'limit': 0.1},
{'ticker':'fb', 'date': '2016-11-28','returns1': -0.2,'returns2': 0.5,'limit': 0.1},
]
df = pd.DataFrame(dict)
df['date']      = pd.to_datetime(df['date'])
df=df.set_index(['date','ticker'], drop=True)

The target would be this:

                   fin_return  limit  returns1  returns2
date       ticker                                       
2016-11-28 jpm           0.03    0.1      0.02      0.03
           ge            0.10    0.1      0.20     -0.30
           fb           -0.10    0.1     -0.20      0.50

So in the first row, the returns never exceeded the limit, so the value becomes equal to the value in returns2 (0.03). In row 2, the returns were exceeded on the upside, so the value should be the positive limit. In row 3 the returns where exceeded on the downside first, so the value should be the negative limit.

My actual dataframe has a couple thousand columns, so I am not quite sure how to do this (maybe a loop?). I appreciate any suggestions.

The idea is to test a stop loss or limit trading algorithm. Whenever, the lower limit is triggered, it should replace the final column with the lower limit, same for the upper limit, whichever comes first for that row. So once either one is triggered, the next row should be tested.

I am adding a different example with one more column here to make this a bit clearer (the limit is +/- 0.1)

                   fin_return  limit  returns1  returns2 returns3
date       ticker                                       
2016-11-28 jpm           0.02    0.1      0.01      0.04    0.02
           ge            0.10    0.1      0.20     -0.30    0.6
           fb           -0.10    0.1     -0.02     -0.20    0.7

In the first row, the limit was never triggered to the final return is from returns3 (0.02). In row 2 the limit was triggered on the upside in returns 1 so the fin_return is equal to the upper limit (anything that happens in returns2 and returns 3 is irrelevant for this row). In row 3, the limited was exceeded on the downside in returns 2, so the fin_return becomes -0.1, and anything in returns3 is irrelevant.

Not 100% sure if understand question - So if all columns returns under limit, it is returned maximum value? Or last column value? Also second and third condition is for test first return column (if first condition is False)? — jezrael
– jezrael, Commented Jun 26, 2019 at 6:02
If all columns are under limit, the returned value should be the return from the last column — Tartaglia
– Tartaglia, Commented Jun 26, 2019 at 6:02
It seems question is more complicated, can you create minimal, complete, and verifiable example ? — jezrael
– jezrael, Commented Jun 26, 2019 at 6:12
Your suggested solution works I think, I am going to test it a bit more quick — Tartaglia
– Tartaglia, Commented Jun 26, 2019 at 6:17
I am going to add more information to my example above, your solution does not quite work. — Tartaglia
– Tartaglia, Commented Jun 26, 2019 at 6:25

jezrael · Accepted Answer · 2019-06-26 07:00:29Z

Use:

dict = [
        {'ticker':'jpm','date': '2016-11-28','returns1': 0.02,'returns2': 0.03,'limit': 0.1,'returns3':0.02},
{ 'ticker':'ge','date': '2016-11-28','returns1': 0.2,'returns2': -0.3,'limit': 0.1,'returns3':0.6},
{'ticker':'fb', 'date': '2016-11-28','returns1': -0.02,'returns2': -0.2,'limit': 0.1,'returns3':0.7},
]
df = pd.DataFrame(dict)
df['date']      = pd.to_datetime(df['date'])
df=df.set_index(['date','ticker'], drop=True)

#select all columns without first (here limit column)
df1 = df.iloc[:, 1:]

#comapre if all columns under +-limit
mask = df1.lt(df['limit'], axis=0) & df1.gt(-df['limit'], axis=0) 
m1 = mask.all(axis=1)
print (m1)
date        ticker
2016-11-28  jpm        True
            ge        False
            fb        False
dtype: bool

#replace first columns in limit with NaNs and back filling missing values, seelct first col
m2 = df1.mask(mask).bfill(axis=1).iloc[:, 0].gt(df['limit'])
print (m2)
date        ticker
2016-11-28  jpm       False
            ge         True
            fb        False
dtype: bool

arr = np.select([m1,m2, ~m2], [df1.iloc[:, -1], df['limit'], -df['limit']])
#set first column in DataFrame by insert
df.insert(0, 'fin_return', arr)
print (df)
                   fin_return  limit  returns1  returns2  returns3
date       ticker                                                 
2016-11-28 jpm           0.02    0.1      0.02      0.03      0.02
           ge            0.10    0.1      0.20     -0.30      0.60
           fb           -0.10    0.1     -0.02     -0.20      0.70

Collectives™ on Stack Overflow

Conditional If Statement applied to multiple columns of dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related