1

I have a dataframe of minute stock returns and I would like to create a new column that is conditional on whether a return was exceeded (pos or negative), and if so that row is equal to the limit (pos or negative), otherwise equal to the last column that was checked. The example below illustrates this:

import pandas as pd
dict = [
        {'ticker':'jpm','date': '2016-11-28','returns1': 0.02,'returns2': 0.03,'limit': 0.1},
{ 'ticker':'ge','date': '2016-11-28','returns1': 0.2,'returns2': -0.3,'limit': 0.1},
{'ticker':'fb', 'date': '2016-11-28','returns1': -0.2,'returns2': 0.5,'limit': 0.1},
]
df = pd.DataFrame(dict)
df['date']      = pd.to_datetime(df['date'])
df=df.set_index(['date','ticker'], drop=True)  

The target would be this:

                   fin_return  limit  returns1  returns2
date       ticker                                       
2016-11-28 jpm           0.03    0.1      0.02      0.03
           ge            0.10    0.1      0.20     -0.30
           fb           -0.10    0.1     -0.20      0.50

So in the first row, the returns never exceeded the limit, so the value becomes equal to the value in returns2 (0.03). In row 2, the returns were exceeded on the upside, so the value should be the positive limit. In row 3 the returns where exceeded on the downside first, so the value should be the negative limit.

My actual dataframe has a couple thousand columns, so I am not quite sure how to do this (maybe a loop?). I appreciate any suggestions.

The idea is to test a stop loss or limit trading algorithm. Whenever, the lower limit is triggered, it should replace the final column with the lower limit, same for the upper limit, whichever comes first for that row. So once either one is triggered, the next row should be tested.

I am adding a different example with one more column here to make this a bit clearer (the limit is +/- 0.1)

                   fin_return  limit  returns1  returns2 returns3
date       ticker                                       
2016-11-28 jpm           0.02    0.1      0.01      0.04    0.02
           ge            0.10    0.1      0.20     -0.30    0.6
           fb           -0.10    0.1     -0.02     -0.20    0.7

In the first row, the limit was never triggered to the final return is from returns3 (0.02). In row 2 the limit was triggered on the upside in returns 1 so the fin_return is equal to the upper limit (anything that happens in returns2 and returns 3 is irrelevant for this row). In row 3, the limited was exceeded on the downside in returns 2, so the fin_return becomes -0.1, and anything in returns3 is irrelevant.

7
  • Not 100% sure if understand question - So if all columns returns under limit, it is returned maximum value? Or last column value? Also second and third condition is for test first return column (if first condition is False)? Commented Jun 26, 2019 at 6:02
  • 1
    If all columns are under limit, the returned value should be the return from the last column Commented Jun 26, 2019 at 6:02
  • It seems question is more complicated, can you create minimal, complete, and verifiable example ? Commented Jun 26, 2019 at 6:12
  • 1
    Your suggested solution works I think, I am going to test it a bit more quick Commented Jun 26, 2019 at 6:17
  • 1
    I am going to add more information to my example above, your solution does not quite work. Commented Jun 26, 2019 at 6:25

1 Answer 1

2

Use:

dict = [
        {'ticker':'jpm','date': '2016-11-28','returns1': 0.02,'returns2': 0.03,'limit': 0.1,'returns3':0.02},
{ 'ticker':'ge','date': '2016-11-28','returns1': 0.2,'returns2': -0.3,'limit': 0.1,'returns3':0.6},
{'ticker':'fb', 'date': '2016-11-28','returns1': -0.02,'returns2': -0.2,'limit': 0.1,'returns3':0.7},
]
df = pd.DataFrame(dict)
df['date']      = pd.to_datetime(df['date'])
df=df.set_index(['date','ticker'], drop=True)  

#select all columns without first (here limit column)
df1 = df.iloc[:, 1:]

#comapre if all columns under +-limit
mask = df1.lt(df['limit'], axis=0) & df1.gt(-df['limit'], axis=0) 
m1 = mask.all(axis=1)
print (m1)
date        ticker
2016-11-28  jpm        True
            ge        False
            fb        False
dtype: bool

#replace first columns in limit with NaNs and back filling missing values, seelct first col
m2 = df1.mask(mask).bfill(axis=1).iloc[:, 0].gt(df['limit'])
print (m2)
date        ticker
2016-11-28  jpm       False
            ge         True
            fb        False
dtype: bool

arr = np.select([m1,m2, ~m2], [df1.iloc[:, -1], df['limit'], -df['limit']])
#set first column in DataFrame by insert
df.insert(0, 'fin_return', arr)
print (df)
                   fin_return  limit  returns1  returns2  returns3
date       ticker                                                 
2016-11-28 jpm           0.02    0.1      0.02      0.03      0.02
           ge            0.10    0.1      0.20     -0.30      0.60
           fb           -0.10    0.1     -0.02     -0.20      0.70
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.