List of dataframes: Slicing a dataframe into a list of dataframes

Question

I have the following function that should return a list of dataframes. These dataframes must not contain any of the already included values.

idx is a list of indices wherecondition is met (dummy=1). Everything around the dummy (n) is then dropped.

My output should be a list of dataframes containing values that were not dropped, but no other (between 2 dummies). The first dataframe is OK. I count the elements and use the for loop to try and collect other slices, however, the slices do not return dataframes that are within wanted limits.

data = pd.DataFrame(data={"A":[1,2,3,4,5,6,7,8,9,10], 
                          "B":[1,3,3,4,5,6,7,8,9,10],
                      "event":[0,0,0,0,1,0,0,0,1,0]})

def EstimationWindow (data, n=3, dummy=1):
    '''
    data....data. Contains ALL data - reurns, and event dummies = event column
    dummy...event=1
    n.......days before/after
    '''    
    idx = data.index.get_indexer_for(data[data.event==dummy].index)
    # Drop event window
    estwin = data.drop((np.unique(np.concatenate([np.arange(max(i-n,0), min(i+n+1, len(data))) for i in idx]))))    
#    estwin = [estwin.iloc[0:i-n] for i in idx]
    output = [estwin.iloc[0:idx[0]-n]]
    for i in idx[1:]:
        out = pd.DataFrame(estwin.loc[len(output):i-n])
        output.append(out)
    return(output)

The function should return a list: output = [df1, df2]

Wanted:

[   A  B  event
 0  1  1      0
 1  2  3      0
 2  3  3      0,    A  B  event
 6  7  7      0]

Result:

 [   A  B  event
 0  1  1      0
 1  2  3      0
 2  3  3      0,    A  B  event
 1  2  3      0
 2  3  3      0
 6  7  7      0]

rafaelc · Accepted Answer · 2019-01-20 14:36:58Z

1

No need for for loops to construct your list of split dfs. Find the dummies, use union to build indexes to drop and just use straightforward groupby:

s = df.event.eq(1)
dummies = s[s].index

ind_to_drop = (dummies + 1).union(dummies).union(dummies - 1)
c = df.event.cumsum().drop(ind_to_drop)

Then

for _, g in df.drop(ind_to_drop).groupby(c):
    print(g)

Yields

   A  B  event
0  1  1      0
1  2  3      0
2  3  3      0

   A  B  event
6  7  7      0

answered Jan 20, 2019 at 14:36

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

r357 Over a year ago

Works! Thank you very much!

Collectives™ on Stack Overflow

List of dataframes: Slicing a dataframe into a list of dataframes

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related