1

I want to drop duplicated rows for a dataframe, based on the type of values of a column. For example, my dataframe is:

A    B
3    4
3    4
3    5
yes  8
no   8
yes  8

If df['A'] is a number, I want to drop_duplicates().

If df['A'] is a string, I want to keep the duplicates.

So the desired result would be:

A    B
3    4
3    5
yes  8
no   8
yes  8

Besides using for loops, is there a Pythonic way to do that? thanks!

2 Answers 2

3

Create a new column C: if A columns is numeric, assign a common value in C, otherwise assign a unique value in C.

After that, just drop_duplicates as normal.

Note: there is a nice isnumeric() method for testing if a cell is number-like.

In [47]:

df['C'] = np.where(df.A.str.isnumeric(), 1, df.index)
print df
     A  B  C
0    3  4  1
1    3  4  1
2    3  5  1
3  yes  8  3
4   no  8  4
5  yes  8  5
In [48]:

print df.drop_duplicates()[['A', 'B']] #reset index if needed
     A  B
0    3  4
2    3  5
3  yes  8
4   no  8
5  yes  8
Sign up to request clarification or add additional context in comments.

Comments

1

This solution is more verbose, but might be more flexible for more involved tests:

def true_if_number(x):
    try:
        int(x)
        return True
    except ValueError:
        return False

rows_numeric = df['A'].apply(true_if_number)

df['A'][rows_numeric].drop_duplicates().append(df['A'][~rows_numeric])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.