Delete duplicate values in a row within a Pandas DataFrame (Python)

Question

what is the expression to remove duplicate-values in any row within a pandas dataframe as follows....(note: first column is the index (date), followed by four columns of data).

1983-02-16 512 517 510 514,
1983-02-17 513 520 513 517,
1983-02-18 500 500 500 500 <-- duplicate values,
1983-02-21 505 505 496 496

Delete row of duplicate values, end up with this...

1983-02-16 512 517 510 514,
1983-02-17 513 520 513 517,
1983-02-21 505 505 496 496

Could only find how to do this by columns, not rows....Many thanks in advance,

Peter

Andy Hayden · Accepted Answer · 2013-05-24 12:42:24Z

1

A slightly more elegant/dynamic (but perhaps less performant version):

In [11]: msk = df1.apply(lambda col: df[1] != col).any(axis=1)
Out[11]:
0     True
1     True
2    False
3     True
dtype: bool

In [12]: msk.index = df1.index  # iloc doesn't support masking

In [13]: df1.loc[msk]
Out[13]:
              1    2    3    4
1983-02-16  512  517  510  514
1983-02-17  513  520  513  517
1983-02-21  505  505  496  496

answered May 24, 2013 at 12:42

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

unutbu · Accepted Answer · 2013-05-24 10:24:11Z

0

import pandas as pd
import io
content = '''\
1983-02-16 512 517 510 514
1983-02-17 513 520 513 517
1983-02-18 500 500 500 500
1983-02-21 505 505 496 496'''
df = pd.read_table(io.BytesIO(content), parse_dates=[0], header=None, sep='\s+',
                   index_col=0)
index = (df[1] == df[2]) & (df[1] == df[3]) & (df[1] == df[4])
df = df.ix[~index]
print(df)

yields

              1    2    3    4
0                             
1983-02-16  512  517  510  514
1983-02-17  513  520  513  517
1983-02-21  505  505  496  496

df.ix can be used to select rows. df = df.ix[~index] selects all rows where index is False.

answered May 24, 2013 at 10:24

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Collectives™ on Stack Overflow

Delete duplicate values in a row within a Pandas DataFrame (Python)

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related