Append only new values to CSV from DataFrame in Python

Question

Let's suppose I have a CSV file which looks like this:

Date,High,Low,Open,Close,Volume,Adj Close
1980-12-12,0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907
1980-12-15,0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
1980-12-16,0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533

I have also a Pandas DataFrame which has exactly the same values but also the new entries. My goal is to append to the CSV file only the new values.

I tried like this, but unfortunately this append not only the new entries, but the old ones also:

df.to_csv('{}/{}'.format(FOLDER, 'AAPL.CSV'), mode='a', header=False)

How do you know which ones are old/new?

Andy Hayden
– Andy Hayden

2019-03-27 22:27:36 +00:00
Commented Mar 27, 2019 at 22:27 — Andy Hayden
– Andy Hayden, Commented Mar 27, 2019 at 22:27
By the index, which is the date (e.g 1980-12-12)

Fogarasi Norbert
– Fogarasi Norbert

2019-03-27 22:29:55 +00:00
Commented Mar 27, 2019 at 22:29 — Fogarasi Norbert
– Fogarasi Norbert, Commented Mar 27, 2019 at 22:29

cullzie · Accepted Answer · 2019-03-27 23:32:54Z

1

You can just re-read your csv file after writing it and drop any duplicates before appending the newly fetched data.

The following code was working for me:

import pandas as pd

# Creating original csv
columns = ['Date','High','Low','Open','Close','Volume','Adj Close']
original_rows = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
]]
df_original = pd.DataFrame(columns=columns, data=original_rows)
df_original.to_csv('AAPL.CSV', mode='w', index=False)

# Fetching the new data
rows_updated = [["1980-12-12",0.515625,0.5133928656578064,0.5133928656578064,0.5133928656578064,117258400.0,0.02300705946981907], ["1980-12-15",0.4888392984867096,0.4866071343421936,0.4888392984867096,0.4866071343421936,43971200.0,0.02180669829249382
], ["1980-12-16",0.453125,0.4508928656578064,0.453125,0.4508928656578064,26432000.0,0.02020619809627533]]
df_updated = pd.DataFrame(columns=columns, data=rows_updated)

# Read in current csv values
current_csv_data = pd.read_csv('AAPL.CSV')

# Drop duplicates and append only new data
new_entries = pd.concat([current_csv_data, df_updated]).drop_duplicates(subset='Date', keep=False)
new_entries.to_csv('AAPL.CSV', mode='a', header=False, index=False)

edited Mar 27, 2019 at 23:32

answered Mar 27, 2019 at 23:25

cullzie

2,7652 gold badges19 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

cullzie Over a year ago

@FogarasiNorbert Did this help at all?

nimishxotwod Over a year ago

this doesnt work with mode ='a' , instead use mode='w'.Since, in append mode it will repeat the entries originally irrespective of repetition

Collectives™ on Stack Overflow

Append only new values to CSV from DataFrame in Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related