1

what is the expression to remove duplicate-values in any row within a pandas dataframe as follows....(note: first column is the index (date), followed by four columns of data).

1983-02-16 512 517 510 514,
1983-02-17 513 520 513 517,
1983-02-18 500 500 500 500 <-- duplicate values,
1983-02-21 505 505 496 496

Delete row of duplicate values, end up with this...

1983-02-16 512 517 510 514,
1983-02-17 513 520 513 517,
1983-02-21 505 505 496 496

Could only find how to do this by columns, not rows....Many thanks in advance,

Peter

2 Answers 2

1

A slightly more elegant/dynamic (but perhaps less performant version):

In [11]: msk = df1.apply(lambda col: df[1] != col).any(axis=1)
Out[11]:
0     True
1     True
2    False
3     True
dtype: bool

In [12]: msk.index = df1.index  # iloc doesn't support masking

In [13]: df1.loc[msk]
Out[13]:
              1    2    3    4
1983-02-16  512  517  510  514
1983-02-17  513  520  513  517
1983-02-21  505  505  496  496
Sign up to request clarification or add additional context in comments.

Comments

0
import pandas as pd
import io
content = '''\
1983-02-16 512 517 510 514
1983-02-17 513 520 513 517
1983-02-18 500 500 500 500
1983-02-21 505 505 496 496'''
df = pd.read_table(io.BytesIO(content), parse_dates=[0], header=None, sep='\s+',
                   index_col=0)
index = (df[1] == df[2]) & (df[1] == df[3]) & (df[1] == df[4])
df = df.ix[~index]
print(df)

yields

              1    2    3    4
0                             
1983-02-16  512  517  510  514
1983-02-17  513  520  513  517
1983-02-21  505  505  496  496

df.ix can be used to select rows. df = df.ix[~index] selects all rows where index is False.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.