2

I have a CSV file that has one column which acts as a serial number. For various reasons that serial number can be repeated on rows, but I want anything other than the most recent listing of that serial number removed.

I imagine this is possible with python/pandas as I currently have it removing if the entire row is a duplicate using pandas. That "almost" works for my needs, but would be much better if I could match only on the one column with the serial number.

Currently it looks like this:

import pandas as pd
df = pd.read_csv('c:/LOG/NEWlog.csv')
df.drop_duplicates(inplace=True)
df.to_csv('c:/PDWLOG/NONDUPES.csv', index=False)

1 Answer 1

2

Try this:

df.drop_duplicates(subset='serial column', inplace=True)
Sign up to request clarification or add additional context in comments.

1 Comment

Ahh I see now in the pandas documentation that option - thank you! I'll give it a try tomorrow.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.