Remove duplicate rows of a CSV file based on a single column

Question

I have a CSV file that has one column which acts as a serial number. For various reasons that serial number can be repeated on rows, but I want anything other than the most recent listing of that serial number removed.

I imagine this is possible with python/pandas as I currently have it removing if the entire row is a duplicate using pandas. That "almost" works for my needs, but would be much better if I could match only on the one column with the serial number.

Currently it looks like this:

import pandas as pd
df = pd.read_csv('c:/LOG/NEWlog.csv')
df.drop_duplicates(inplace=True)
df.to_csv('c:/PDWLOG/NONDUPES.csv', index=False)

Riccardo Bucco · Accepted Answer · 2022-01-02 23:24:17Z

2

Try this:

df.drop_duplicates(subset='serial column', inplace=True)

answered Jan 2, 2022 at 23:24

Riccardo Bucco

15.5k4 gold badges29 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Derek B Over a year ago

Ahh I see now in the pandas documentation that option - thank you! I'll give it a try tomorrow.

Collectives™ on Stack Overflow

Remove duplicate rows of a CSV file based on a single column

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related