How can I add to a pandas column based on another column

Question

Currently I have a table that looks like this

ID       Previous_Injuries    Currently_Injured      Injury_Type
1            Nan                      0                  Nan
1            Nan                      1                  Ankle
1            Nan                      0                  Nan
1            Nan                      1                  Wrist
1            Nan                      0                  Nan
1            Nan                      1                  Leg
1            Nan                      0                  Nan
2            Nan                      1                  Leg
2            Nan                      0                  Nan

I would like to add to the Previous Injuries Column and make my table look like this:

ID       Previous_Injuries    Currently_Injured      Injury_Type
1            Nan                      0                  Nan
1            Nan                      1                  Ankle
1            [Ankle]                  0                  Nan
1            [Ankle]                  1                  Wrist
1            [Ankle,Wrist]            0                  Nan
1            [Ankle,Wrist]            1                  Leg
1            [Ankle,Wrist,Leg]        0                  Nan
2            Nan                      1                  Leg
2            [Leg]                    0                  Nan

How can I achieve this sort of a column in pandas? And is it best to do it in the form of a list?

Thanks!

Usually storing lists (or other objects) in a DataFrame is inefficient and makes other manipulations much more complicated. Though sometimes it can be fine if your data aren't huge. What do you need to do with this information after? — ALollz
– ALollz, Commented Nov 1, 2019 at 17:29

BENY · Accepted Answer · 2019-11-01 18:21:25Z

4

We can do shift with cumsum, then split the string, Notice here you are using the Nan(string type) , which is not np.nan

s=df.Injury_Type.shift().fillna('Nan').add(',').cumsum().str[:-1].str.split(',')
df['new']=[[y  for y in x if y != 'Nan'] for x in s ]
df
Out[322]: 
   ID Previous_Injuries  Currently_Injured Injury_Type                  new
0   1               Nan                  0         Nan                   []
1   1               Nan                  1       Ankle                   []
2   1               Nan                  0         Nan              [Ankle]
3   1               Nan                  1       Wrist              [Ankle]
4   1               Nan                  0         Nan       [Ankle, Wrist]
5   1               Nan                  1         Leg       [Ankle, Wrist]
6   1               Nan                  0         Nan  [Ankle, Wrist, Leg]

Change the question again !

l=[]
for name , dfx in df.groupby('ID'):
    s = dfx.Injury_Type.shift().fillna('Nan').add(',').cumsum().str[:-1].str.split(',')
    dfx['new'] = [[y for y in x if y != 'Nan'] for x in s]
    l.append(dfx)

pd.concat(l)

edited Nov 1, 2019 at 18:21

answered Nov 1, 2019 at 17:41

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

soccer_analytics_fan Over a year ago

Thanks, made an edit to the post to include different IDs. Would the code change in that case?

BENY Over a year ago

@soccer_analytics_fan groupby do it each group then concat

ansev · Accepted Answer · 2019-11-01 18:24:56Z

3

Use:

df['Previous_Injuries']=( df['Injury_Type'].replace('Nan',np.nan).fillna(' ')
                                          .cumsum().shift(fill_value='')
                                          .str.split() )
print(df)

replace('Nan', np.nan) can be omitted if the NaNs are not str

   ID    Previous_Injuries  Currently_Injured Injury_Type
0   1                   []                  0         Nan
1   1                   []                  1       Ankle
2   1              [Ankle]                  0         Nan
3   1              [Ankle]                  1       Wrist
4   1       [Ankle, Wrist]                  0         Nan
5   1       [Ankle, Wrist]                  1         Leg
6   1  [Ankle, Wrist, Leg]                  0         Nan

Use DataFrame.groupby for differents ID

df['Previous_Injuries']=( df.groupby('ID')['Injury_Type']
                            .apply(lambda x: x.replace('Nan',np.nan).fillna(' ')
                                              .cumsum().shift(fill_value='')
                                              .str.split()) )
print(df)

   ID    Previous_Injuries  Currently_Injured Injury_Type
0   1                   []                  0         Nan
1   1                   []                  1       Ankle
2   1              [Ankle]                  0         Nan
3   1              [Ankle]                  1       Wrist
4   1       [Ankle, Wrist]                  0         Nan
5   1       [Ankle, Wrist]                  1         Leg
6   1  [Ankle, Wrist, Leg]                  0         Nan
7   2                   []                  1         Leg
8   2                [Leg]                  0         Nan

edited Nov 1, 2019 at 18:24

answered Nov 1, 2019 at 17:44

ansev

31k5 gold badges21 silver badges33 bronze badges

5 Comments

Anton vBR Over a year ago

Good but why not just: df['Injury_Type'].replace('Nan', ' ').cumsum().shift().str.split().bfill()?

soccer_analytics_fan Over a year ago

Thanks, made an edit to the post to include different IDs. Would the code change in that case?

ansev Over a year ago

You're right, the double use of replace is because I don't know if OP has Nan or np.nan values @Anton vBR

soccer_analytics_fan Over a year ago

using np.nan @ansev

ansev Over a year ago

I added a solution for differents ID @soccer_analytics_fan

Collectives™ on Stack Overflow

How can I add to a pandas column based on another column

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related