0

Setting keep=False should remove all duplicates but if I run my function is still returns a duplicate of the previous row

def date_to_csv():
   import pandas as pd
   from random import randint
   df = pd.read_csv("test.csv")
   df = df.append({'Date': datetime.date.today(), 'Price': randint(1,100)}, ignore_index=True)
   result_df = df.drop_duplicates(keep=False)
   result_df.to_csv('test.csv', mode='a', index=False, header=None)

If my csv file is empty with only the column headers 'Date' and 'Price' and I run my function 3 times it returns this in csv:

Date,Price
2021-06-26,74
2021-06-26,74
2021-06-26,51
2021-06-26,51
2021-06-26,13

When I expect it to return something like this:

Date,Price
2021-06-26,74
2021-06-26,51
2021-06-26,13
2
  • Are there other fields in your test.csv? Commented Jun 26, 2021 at 10:40
  • only the two column headers 'Date' and 'Price' Commented Jun 26, 2021 at 10:45

1 Answer 1

2

Because of mode='a' you can't remove previous duplicates after several execution of your function. Here is a code for your expected behaviour:

import pandas as pd
from datetime import datetime


def date_to_csv(): 
     df = pd.read_csv('test.csv') 
     df = df.append({'Date': str(datetime.now().date()), 'Price': randint(1, 100)}, ignore_index=True) 
     df.to_csv('test.csv', index=False) 
Sign up to request clarification or add additional context in comments.

1 Comment

Also the last Date value is a datetime object, the previous ones are strings.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.