2

I have a DataFrame that looks like this

df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B'],
                  'attritube1' : [0,1,1,1,0,2,9], 
                  'attritube2':[1,2,3,3,0,0,1]})
print(df)

     attritube1  attritube2 user
0           0           1    A
1           1           2    A
2           1           3    A
3           1           3    B
4           0           0    B
5           2           0    B
6           9           1    B

I would like to slice the data with a rolling window of length K for every user and create a new data set. For example, if K = 2, then I would like to get

   attritube1  attritube2 user
0           0           1    A
1           1           2    A
---------------------------------
2           1           2    A
3           1           3    A
---------------------------------
4           1           3    B
5           0           0    B
---------------------------------
6           0           0    B
7           2           0    B
--------------------------------
8           2           0    B
9           9           1    B

Similarly, if K = 3, then the new data frame should be

    attritube1  attritube2 user
0           0           1    A
1           1           2    A
2           1           3    A
--------------------------------
3           1           3    B
4           0           0    B
5           2           0    B
--------------------------------
6           0           0    B
7           2           0    B
8           9           1    B

We can assume that for all users, the number of rows >= K. Thanks!

Edit: Want to clarify that I want to repeat the rolling window procedure for every user (A,B in the toy example).

2 Answers 2

2

Try:

k=3
pd.concat([df.groupby('user').apply(lambda x: pd.concat([x.iloc[i: i + k] for i in range(len(x.index) - k + 1)]))])


        attribute1  attribute2 user
user                               
A    0           0           1    A
     1           1           2    A
     2           1           3    A
B    3           1           3    B
     4           0           0    B
     5           2           0    B
     4           0           0    B
     5           2           0    B
     6           9           1    B
Sign up to request clarification or add additional context in comments.

3 Comments

I want to clarify that I want to repeat the rolling window procedure for every user (see "user" column). Sorry if it wasn't clear in my original post.
Hi Stefan, thanks for your help! But the output is different from the sample output for k = 3 which is shown in the posting.
see update, I think the new version gives you what you are looking for.
0
 df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B','A', 'A', 'A', 'B', 'B', 'C','B','A', 'C', 'C', 'B', 'B', 'B','B'],
              'attritube1' : [0,1,1,1,0,2,9,0,1,1,1,0,2,9,0,1,1,1,0,2,9], 
              'attritube2':[1,2,3,3,0,0,1,0,1,1,1,0,2,9,0,1,1,1,0,2,9]})


 # creating Multi Index Data Frame
 m_df=df.set_index(df["user"],append=True)
 m_df=m_df.swaplevel(0,1,axis=0)


 k=2


 final_df=pd.concat([m_df.loc[item].iloc[:k] for item in sorted(set(df["user"]))])
 final_df.index=range(final_df.shape[0])  # to resort the index 


print final_df

This answer used Multi Index Data Frame and does it step by step, which (at least for me) is a little easier to read.

1 Comment

I want to clarify that I want to repeat the rolling window procedure for every user (see "user" column). Sorry if it wasn't clear in my original post.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.