Python Pandas: remove duplicate in csv file with no headings

Question

sorry for the dumb question I am new to python and pandas.

Imagine I've got a csv file with some data for every row, for example :

data1, data2, data3, data4

There are no headings, just data, and I need to remove some rows inside such file if

(row1.data3 and row1.data4) == (row2.data3 and row2.data4)

the entire row gets removed.

How can I achieve that?

I did try to use remove_duplicates but without headings I don't know how to do it.

cheers

Just to make sure, you're resetting the dataframe after remove_duplicates, right? remove_duplicates does not work inplace unless you ask it to. Headings wouldn't matter much here. If a row is a duplicate of another row and they are the same data type remove_duplicates should remove it. — Quentin
– Quentin, Commented May 7, 2017 at 0:19

Sergey Bushmanov · Accepted Answer · 2017-05-07 03:47:51Z

3

Let's say you happen to have a df without header:

df = pd.read_csv("./try.csv", header=None)
df
# The first row is integers inserted instead of missing column names 
    0   1   2
0   1   1   1
1   1   1   1
2   2   1   3
3   2   1   3
4   3   2   3
5   3   3   3

Then, you can drop_duplicates on subsets of columns:

df.drop_duplicates([0])
    0   1   2
0   1   1   1
2   2   1   3
4   3   2   3

or

df.drop_duplicates([0,1])

    0   1   2
0   1   1   1
2   2   1   3
4   3   2   3
5   3   3   3

Do not forget to assign the result to a new variable or add inplace=True

answered May 7, 2017 at 3:47

Sergey Bushmanov

25.5k8 gold badges63 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sergey Bushmanov Over a year ago

@user1583007 Why not accept the answer if it worked for you?

Collectives™ on Stack Overflow

Python Pandas: remove duplicate in csv file with no headings

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related