Python Dataframe conditionally drop_duplicates

Question

I want to drop duplicated rows for a dataframe, based on the type of values of a column. For example, my dataframe is:

A    B
3    4
3    4
3    5
yes  8
no   8
yes  8

If df['A'] is a number, I want to drop_duplicates().

If df['A'] is a string, I want to keep the duplicates.

So the desired result would be:

A    B
3    4
3    5
yes  8
no   8
yes  8

Besides using for loops, is there a Pythonic way to do that? thanks!

CT Zhu · Accepted Answer · 2015-10-20 15:17:48Z

3

Create a new column C: if A columns is numeric, assign a common value in C, otherwise assign a unique value in C.

After that, just drop_duplicates as normal.

Note: there is a nice isnumeric() method for testing if a cell is number-like.

In [47]:

df['C'] = np.where(df.A.str.isnumeric(), 1, df.index)
print df
     A  B  C
0    3  4  1
1    3  4  1
2    3  5  1
3  yes  8  3
4   no  8  4
5  yes  8  5
In [48]:

print df.drop_duplicates()[['A', 'B']] #reset index if needed
     A  B
0    3  4
2    3  5
3  yes  8
4   no  8
5  yes  8

edited Oct 20, 2015 at 15:17

answered Oct 20, 2015 at 15:05

CT Zhu

54.6k18 gold badges125 silver badges136 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ojdo · Accepted Answer · 2015-10-20 15:15:38Z

1

This solution is more verbose, but might be more flexible for more involved tests:

def true_if_number(x):
    try:
        int(x)
        return True
    except ValueError:
        return False

rows_numeric = df['A'].apply(true_if_number)

df['A'][rows_numeric].drop_duplicates().append(df['A'][~rows_numeric])

answered Oct 20, 2015 at 15:15

ojdo

9,1358 gold badges43 silver badges66 bronze badges

Collectives™ on Stack Overflow

Python Dataframe conditionally drop_duplicates

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related