1

Currently I have a table that looks like this

ID       Previous_Injuries    Currently_Injured      Injury_Type
1            Nan                      0                  Nan
1            Nan                      1                  Ankle
1            Nan                      0                  Nan
1            Nan                      1                  Wrist
1            Nan                      0                  Nan
1            Nan                      1                  Leg
1            Nan                      0                  Nan
2            Nan                      1                  Leg
2            Nan                      0                  Nan

I would like to add to the Previous Injuries Column and make my table look like this:

ID       Previous_Injuries    Currently_Injured      Injury_Type
1            Nan                      0                  Nan
1            Nan                      1                  Ankle
1            [Ankle]                  0                  Nan
1            [Ankle]                  1                  Wrist
1            [Ankle,Wrist]            0                  Nan
1            [Ankle,Wrist]            1                  Leg
1            [Ankle,Wrist,Leg]        0                  Nan
2            Nan                      1                  Leg
2            [Leg]                    0                  Nan

How can I achieve this sort of a column in pandas? And is it best to do it in the form of a list?

Thanks!

1
  • 2
    Usually storing lists (or other objects) in a DataFrame is inefficient and makes other manipulations much more complicated. Though sometimes it can be fine if your data aren't huge. What do you need to do with this information after? Commented Nov 1, 2019 at 17:29

2 Answers 2

4

We can do shift with cumsum, then split the string, Notice here you are using the Nan(string type) , which is not np.nan

s=df.Injury_Type.shift().fillna('Nan').add(',').cumsum().str[:-1].str.split(',')
df['new']=[[y  for y in x if y != 'Nan'] for x in s ]
df
Out[322]: 
   ID Previous_Injuries  Currently_Injured Injury_Type                  new
0   1               Nan                  0         Nan                   []
1   1               Nan                  1       Ankle                   []
2   1               Nan                  0         Nan              [Ankle]
3   1               Nan                  1       Wrist              [Ankle]
4   1               Nan                  0         Nan       [Ankle, Wrist]
5   1               Nan                  1         Leg       [Ankle, Wrist]
6   1               Nan                  0         Nan  [Ankle, Wrist, Leg]

Change the question again !

l=[]
for name , dfx in df.groupby('ID'):
    s = dfx.Injury_Type.shift().fillna('Nan').add(',').cumsum().str[:-1].str.split(',')
    dfx['new'] = [[y for y in x if y != 'Nan'] for x in s]
    l.append(dfx)

pd.concat(l)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, made an edit to the post to include different IDs. Would the code change in that case?
@soccer_analytics_fan groupby do it each group then concat
3

Use:

df['Previous_Injuries']=( df['Injury_Type'].replace('Nan',np.nan).fillna(' ')
                                          .cumsum().shift(fill_value='')
                                          .str.split() )
print(df)

replace('Nan', np.nan) can be omitted if the NaNs are not str


   ID    Previous_Injuries  Currently_Injured Injury_Type
0   1                   []                  0         Nan
1   1                   []                  1       Ankle
2   1              [Ankle]                  0         Nan
3   1              [Ankle]                  1       Wrist
4   1       [Ankle, Wrist]                  0         Nan
5   1       [Ankle, Wrist]                  1         Leg
6   1  [Ankle, Wrist, Leg]                  0         Nan

Use DataFrame.groupby for differents ID

df['Previous_Injuries']=( df.groupby('ID')['Injury_Type']
                            .apply(lambda x: x.replace('Nan',np.nan).fillna(' ')
                                              .cumsum().shift(fill_value='')
                                              .str.split()) )
print(df)

   ID    Previous_Injuries  Currently_Injured Injury_Type
0   1                   []                  0         Nan
1   1                   []                  1       Ankle
2   1              [Ankle]                  0         Nan
3   1              [Ankle]                  1       Wrist
4   1       [Ankle, Wrist]                  0         Nan
5   1       [Ankle, Wrist]                  1         Leg
6   1  [Ankle, Wrist, Leg]                  0         Nan
7   2                   []                  1         Leg
8   2                [Leg]                  0         Nan

5 Comments

Good but why not just: df['Injury_Type'].replace('Nan', ' ').cumsum().shift().str.split().bfill()?
Thanks, made an edit to the post to include different IDs. Would the code change in that case?
You're right, the double use of replace is because I don't know if OP has Nan or np.nan values @Anton vBR
using np.nan @ansev
I added a solution for differents ID @soccer_analytics_fan

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.