Replace values based on index pandas

Question

I'm working with a dataset, from which a subset has initial values and final values. I created an id that lets me identify those observations, so after applying this:

df['aux']=df.duplicated(subset=['id'], keep=False)

df_dup=df_dup[df_dup.aux==True]
df_dup.sort_values(by='id').reset_index(inplace=True)

I get something like this:

index     id    status    value
  88      1    'initial'    8
  95      1    'final'      12
  63      2    'initial'    9
  52      2    'final'      13

What I want to achieve is to replace the final value in the initial value:

index     id    status    value
  88      1    'initial'    12
  95      1    'final'      12
  63      2    'initial'    13
  52      2    'final'      13

I tried several things, my last attempt was this:

df_dup[df_dup.status=='initial'].reset_index().value= \
df_dup[df_dup.status=='final'].reset_index().value

But that fills initial values with nan:

index     id    status    value
  88      1    'initial'    nan
  95      1    'final'      12
  63      2    'initial'    nan
  52      2    'final'      13

What am I missing? Thanks

jezrael · Accepted Answer · 2018-10-17 14:11:58Z

2

Use GroupBy.transform with last - it also replace unique values of id, but it return same value:

df['value'] = df.groupby('id')['value'].transform('last')
print (df)
   index  id     status  value
0     88   1  'initial'     12
1     95   1    'final'     12
2     63   2  'initial'     13
3     52   2    'final'     13

If want replace only duplicated id rows (reason many unique values, so better performance):

mask = df.duplicated(subset=['id'], keep=False)
df.loc[mask, 'value'] = df[mask].groupby('id')['value'].transform('last')

answered Oct 17, 2018 at 14:11

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Juan C Over a year ago

Thank you a lot, both your answer and Wen's worked, but I find yours a little bit more readable.

BENY · Accepted Answer · 2018-10-17 14:16:29Z

2

Without groupby and base on your drop_duplicates

df.value=df.id.map(df.drop_duplicates('id',keep='last').set_index('id').value)
df
Out[436]: 
   index  id     status  value
0     88   1  'initial'     12
1     95   1    'final'     12
2     63   2  'initial'     13
3     52   2    'final'     13

answered Oct 17, 2018 at 14:16

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

Replace values based on index pandas

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related