Remove duplicate rows in pandas dataframe based on condition

Question

            is_avail   valu data_source
2015-08-07     False  0.282    source_a
2015-08-07     False  0.582    source_b
2015-08-23     False  0.296    source_a
2015-09-08     False  0.433    source_a
2015-10-01      True  0.169    source_b

In the dataframe above, I want to remove the duplicate rows (i.e. row where the index is repeated) by retaining the row with a higher value in the valu column.

I can remove rows with duplicate indexes like this:

df = df[~df.index.duplicated()]. But how to remove based on condition specified above?

this might help: stackoverflow.com/questions/13035764/…

Paul H
– Paul H

2017-05-05 22:21:45 +00:00
Commented May 5, 2017 at 22:21 — Paul H
– Paul H, Commented May 5, 2017 at 22:21

Allen Qin · Accepted Answer · 2017-05-05 22:21:09Z

8

You can use groupby on index after sorting the df by valu.

df.sort_values(by='valu', ascending=False).groupby(level=0).first()
Out[1277]: 
           is_avail   valu data_source
2015-08-07    False  0.582    source_b
2015-08-23    False  0.296    source_a
2015-09-08    False  0.433    source_a
2015-10-01     True  0.169    source_b

answered May 5, 2017 at 22:21

Allen Qin

20k9 gold badges55 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

piRSquared Over a year ago

This is the better solution! However, you don't need ascending=False. I'd change it to df.sort_values('valu').groupby(level=0).tail(1)

piRSquared · Accepted Answer · 2017-05-05 22:21:12Z

5

Using drop_duplicates with keep='last'

df.rename_axis('date').reset_index() \
    .sort_values(['date', 'valu']) \
    .drop_duplicates('date', keep='last') \
    .set_index('date').rename_axis(df.index.name)

           is_avail   valu data_source
2015-08-07    False  0.582    source_b
2015-08-23    False  0.296    source_a
2015-09-08    False  0.433    source_a
2015-10-01     True  0.169    source_b

answered May 5, 2017 at 22:21

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Collectives™ on Stack Overflow

Remove duplicate rows in pandas dataframe based on condition

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related