Python/Pandas filter out unique rows from DataFrames

Question

I tow or three DataFrames that have duplicated rows.

In [31]: df1
Out[31]: 
    member           time
0       0 2009-09-30 12:00:00
1       0 2009-09-30 18:00:00
2       0 2009-10-01 00:00:00
3       1 2009-09-30 12:00:00
4       1 2009-09-30 18:00:00
5       2 2009-09-30 12:00:00
6       3 2009-09-30 12:00:00
...

In [32]: df2
Out[32]: 
    member           time
0       0 2009-09-30 12:00:00
1       0 2009-09-30 18:00:00
3       1 2009-09-30 12:00:00
4       2 2009-09-30 12:00:00
5       2 2009-09-30 18:00:00
6       2 2009-10-01 00:00:00
...

I'd like to filter out the rows that have unique value of 'member' and 'time' from df1 and df2, and get a DataFrame that has only rows that have the common value of 'member' and 'time' in df1 and df2, that is

In [33]: df_duplicated_1_and_2
Out[33]: 
    member           time
0       0 2009-09-30 12:00:00
1       0 2009-09-30 18:00:00
3       1 2009-09-30 12:00:00
4       2 2009-09-30 12:00:00
...

Is there a efficient and elegant way to do this ?

Update If possible, I'd like to get not a new merged DataFrame but a filtered DataFrame. e.g.,

In [34]: df1
Out[34]: 
    member           time           value
0       0 2009-09-30 12:00:00  a
1       0 2009-09-30 18:00:00  b
2       0 2009-10-01 00:00:00  c
3       1 2009-09-30 12:00:00  d
4       1 2009-09-30 18:00:00  e
5       2 2009-09-30 12:00:00  f
6       3 2009-09-30 12:00:00  g
...

In [35]: df1_filtered_out
Out[35]: 
    member           time           value
0       0 2009-09-30 12:00:00  a
1       0 2009-09-30 18:00:00  b
3       1 2009-09-30 12:00:00  d
4       2 2009-09-30 12:00:00  g
...

and also get filtered df2.

Viktor Kerkez · Accepted Answer · 2013-09-23 13:52:37Z

3

Do a inner join on member and time columns:

>>> df1.merge(df2, on=['member', 'time'], how='inner')
   member                time
0       0 2009-09-30 12:00:00
1       0 2009-09-30 18:00:00
2       1 2009-09-30 12:00:00
3       2 2009-09-30 12:00:00

This will produce a result that has only the rows that have the same member and time values in both DataFrames.

Update:

>>> df1.merge(df2[['member', 'time']])
   member                time value
0       0 2009-09-30 12:00:00     a
1       0 2009-09-30 18:00:00     b
2       1 2009-09-30 12:00:00     d
3       2 2009-09-30 12:00:00     f

edited Sep 23, 2013 at 13:52

answered Sep 23, 2013 at 8:41

Viktor Kerkez

46.8k13 gold badges109 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

EdChum Over a year ago

Merges are 'inner' by default so the how parameter is not necessary.

Viktor Kerkez Over a year ago

@EdChum I know, but I explicitly specified the how parameter to show the OP how can he change this behavior to right, left or outer if he decides to do a different thing. But yes, this is a useful comment. +1.

user1979961 Over a year ago

Thanks for your answer and comments. Your answer is the almost same as what I'd like to do, but I'd like to get 'filtered' DataFrame, not 'merged'. Could you tell me the way to filter out duplicated raw? (Updated my question)

Viktor Kerkez Over a year ago

@Tetsuro the answer is the same. Just select out the columns from the df2 frame: df1.merge(df2[['member', 'time']])

Viktor Kerkez Over a year ago

@Tetsuro Also since this is boolean indexing, you cannot get a view, you will, no meter what you do, get a copy.

|

Collectives™ on Stack Overflow

Python/Pandas filter out unique rows from DataFrames

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related