3

This question is slightly more complicated than Remove duplicate rows in pandas dataframe based on condition:

Instead of one 'valu' column, I now have two columns 'valu1', 'valu2':

         t    valu1    valu2
2015-08-01        1       10
2015-08-01        2       11
2015-08-01        3       12
2015-09-31        4       15
2015-10-31        5       13

In the dataframe above, I want to remove the duplicate rows (i.e. row where the column 't' is repeated) by retaining the row with a higher value in the valu1 column and a lower value in the value2 column.

Expected outcome:

         t    valu1    valu2
2015-08-01        3       10
2015-09-31        4       15
2015-10-31        5       13

The df.sort_values() and drop_duplicates with keep='last' mentioned in the linked question obviously don't work.

Something I can think of now is:

#Let's call the dataframe df
dups = df[df['t'].duplicated()]['t'].drop_duplicates()  #get duplicated dates
for d in dups:
    max_v1 = df[df['t'] == d]['valu1'].max()  #find the max of valu1 on day d
    min_v2 = df[df['t'] == d]['valu2'].min()  #find the min of valu2 on day d
    df[df['t'] == d]['valu1'] = max_v1        #set valu1 of day d to max_v1
    df[df['t'] == d]['valu2'] = min_v2        #set valu2 of day d to min_v2

df = df[~df.index.duplicated()]               #drop everything duplicated

I think this should work, but it really seems unsophisticated, especially I actually need to do this for a large dataset. Any idea of how I should approach this problem?

1 Answer 1

5

I think you are looking for

df.groupby('t').agg({'valu1':'max','valu2':'min'}).reset_index()
            t  valu1  valu2
0  2015-08-01      3     10
1  2015-09-31      4     15
2  2015-10-31      5     13
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.