I have the following function that should return a list of dataframes. These dataframes must not contain any of the already included values.
idx is a list of indices wherecondition is met (dummy=1). Everything around the dummy (n) is then dropped.
My output should be a list of dataframes containing values that were not dropped, but no other (between 2 dummies). The first dataframe is OK. I count the elements and use the for loop to try and collect other slices, however, the slices do not return dataframes that are within wanted limits.
data = pd.DataFrame(data={"A":[1,2,3,4,5,6,7,8,9,10],
"B":[1,3,3,4,5,6,7,8,9,10],
"event":[0,0,0,0,1,0,0,0,1,0]})
def EstimationWindow (data, n=3, dummy=1):
'''
data....data. Contains ALL data - reurns, and event dummies = event column
dummy...event=1
n.......days before/after
'''
idx = data.index.get_indexer_for(data[data.event==dummy].index)
# Drop event window
estwin = data.drop((np.unique(np.concatenate([np.arange(max(i-n,0), min(i+n+1, len(data))) for i in idx]))))
# estwin = [estwin.iloc[0:i-n] for i in idx]
output = [estwin.iloc[0:idx[0]-n]]
for i in idx[1:]:
out = pd.DataFrame(estwin.loc[len(output):i-n])
output.append(out)
return(output)
The function should return a list: output = [df1, df2]
Wanted:
[ A B event
0 1 1 0
1 2 3 0
2 3 3 0, A B event
6 7 7 0]
Result:
[ A B event
0 1 1 0
1 2 3 0
2 3 3 0, A B event
1 2 3 0
2 3 3 0
6 7 7 0]