3

I‘m working with some data and here I want to get the ranks (finishing position) for each horse in its recent runs (up to 6 runs before the run). The date of the run is defined as 'race_id'.

Is there a way to use groupby and agg but only aggregates the previous 6 values?

The data frame is as follows:

finishing_position  horse_id    race_id
 1                  K01         2014011
 2                  K02         2014011
 3                  M01         2014011
 4                  K01         2014012
 2                  K01         2014021
 3                  K01         2014031
 1                  M01         2015011
 2                  K01         2016012
 1                  K02         2016012
 3                  M01         2016012
 4                  J01         2016012 

I want the result to be

finishing_position  horse_id    race_id     recent
 1                  K01         2014011
 2                  K02         2014011
 3                  M01         2014011
 4                  K01         2014012     1
 2                  K01         2014021     1/4
 3                  K01         2014031     1/4/2
 1                  M01         2015011     3
 2                  K01         2016012     1/4/2/3
 1                  K02         2016012     2
 3                  M01         2016012     3/1
 4                  J01         2016012   

2 Answers 2

3

We can using cumsum with groupby

df['recent']=df.finishing_position.astype(str)+'/'
df['recent']=df.groupby('horse_id').recent.apply(lambda x : x.cumsum().shift().str[:-1].fillna(''))
df
Out[140]: 
    finishing_position horse_id  race_id   recent
0                    1      K01  2014011         
1                    2      K02  2014011         
2                    3      M01  2014011         
3                    4      K01  2014012        1
4                    2      K01  2014021      1/4
5                    3      K01  2014031    1/4/2
6                    1      M01  2015011        3
7                    2      K01  2016012  1/4/2/3
8                    1      K02  2016012        2
9                    3      M01  2016012      3/1
10                   4      J01  2016012         
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, but where to make the cumsum to aggregate up to only the 6 previous records?
use this: select *, row_number() over(partition by horse_id order by race_id desc) racesback and then filter racesback how you want
@goodBOB seems like you need rolling sum , but this can not match you expected out put
@Wen Yes. I tried rolling sum, it sums up all the 6 values. Here's my solution, I can get rid of extra ranks on the new df after cumsum.
1

Revised on @Wen answer to get aggregate up to only the N previous records.

df['recent']=df.finishing_position.astype(str)+'/'
df['recent']=df.groupby('horse_id').recent.apply(lambda x : x.cumsum().shift().str[:-1].fillna(''))

def last_n_record(string,recent_no):
    count = string.count('/')
    if count+1 >= recent_no:
       return string.split('/',count - recent_no + 1)[-1]
    else:
       return string

recent_no = 3 # Lets take 3 recent records as demo
df['recent'] = df.recent.apply(lambda x: last_n_record(x,recent_no))
df
    finishing_position horse_id  race_id recent
0                    1      K01  2014011       
1                    2      K02  2014011       
2                    3      M01  2014011       
3                    4      K01  2014012      1
4                    2      K01  2014021    1/4
5                    3      K01  2014031  1/4/2
6                    1      M01  2015011      3
7                    2      K01  2016012  4/2/3
8                    1      K02  2016012      2
9                    3      M01  2016012    3/1
10                   4      J01  2016012       

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.