I've been looking around tried to get examples but can't get it work the way i want to.
I want to dedupe by 'OrderID' and extract duplicates to seperate CSV. Main thing is I need to be able to change the column which I want to dedupe by, in this case its 'Order ID'.
Example Data set:
ID Fruit Order ID Quantity Price 1 apple 1111 11 £2.00 2 banana 2222 22 £3.00 3 orange 3333 33 £5.00 4 mango 4444 44 £7.00 5 Kiwi 3333 55 £5.00
Output:
ID Fruit Order ID Quantity Price 5 Kiwi 3333 55 £5.00
I've tried this:
import pandas as pd
df = pd.read_csv('C:/Users/shane/PycharmProjects/PythonTut/deduping/duplicate example.csv')
new_df = df[['ID','Fruit','Order ID','Quantity','Price']].drop_duplicates()
new_df.to_csv('C:/Users/shane/PycharmProjects/PythonTut/deduping/duplicate test.csv', index=False)
Issue i have is it doesn't remove any duplicates.