0

Let's suppose I have a CSV file which looks like this:

Date,High,Low,Open,Close,Volume,Adj Close
1980-12-12,0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907
1980-12-15,0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
1980-12-16,0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533

I have also a Pandas DataFrame which has exactly the same values but also the new entries. My goal is to append to the CSV file only the new values.

I tried like this, but unfortunately this append not only the new entries, but the old ones also:

df.to_csv('{}/{}'.format(FOLDER, 'AAPL.CSV'), mode='a', header=False)
2
  • 1
    How do you know which ones are old/new? Commented Mar 27, 2019 at 22:27
  • By the index, which is the date (e.g 1980-12-12) Commented Mar 27, 2019 at 22:29

1 Answer 1

1

You can just re-read your csv file after writing it and drop any duplicates before appending the newly fetched data.

The following code was working for me:

import pandas as pd

# Creating original csv
columns = ['Date','High','Low','Open','Close','Volume','Adj Close']
original_rows = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
]]
df_original = pd.DataFrame(columns=columns, data=original_rows)
df_original.to_csv('AAPL.CSV', mode='w', index=False)

# Fetching the new data
rows_updated = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
], ["1980-12-16",0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533]]
df_updated = pd.DataFrame(columns=columns, data=rows_updated)

# Read in current csv values
current_csv_data = pd.read_csv('AAPL.CSV')

# Drop duplicates and append only new data
new_entries = pd.concat([current_csv_data, df_updated]).drop_duplicates(subset='Date', keep=False)
new_entries.to_csv('AAPL.CSV', mode='a', header=False, index=False)
Sign up to request clarification or add additional context in comments.

2 Comments

@FogarasiNorbert Did this help at all?
this doesnt work with mode ='a' , instead use mode='w'.Since, in append mode it will repeat the entries originally irrespective of repetition

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.