Insert or update if primary key exists into postgreSQL table with .to_sql()

Question

I have a pandas DataFrame that consists of multiple columns that I want to store into the postgreSQL database, using .to_sql():

my_table.to_sql('table', con=engine, schema='wrhouse', if_exists='append', index=False)

I have set a primary key (date), in order to avoid duplicate entries. So above-mentioned command works when my primary key does not exist in the database.

However, if that key exists I am getting the following error:

IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "table_pkey"
DETAIL:  Key (date)=(2022-07-01 00:00:00) already exists.

Now, what I would like to do is:

Update the row with the already existed Key(date)
Insert a new row in case the Key(date) does not exist

I checked the documentation on: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html but I could't find any option by using the DataFrame.to_sql() function.

Additionally, if I change the if_exists='append' parameter to if_exists='replace', it deletes the whole table and that is not what I want.

Is there any way to update/insert rows using the .to_sql() function?

Use an UPSERT (INSERT .. ON CONFLICT ... DO UPDATE). Not sure how to dumb that down for your ORM. (People should just use SQL instead of the 1000 different, mostly half-broken obfuscation layers.) — Erwin Brandstetter
– Erwin Brandstetter, Commented Nov 14, 2022 at 14:01
@ErwinBrandstetter This line of code works, when I am explicitly typing the VALUES: engine = conn.execute("INSERT INTO wrschema.table (date, first_hour, last_hour, quantity) VALUES ('2022-07-01 00:00:00', 15, 17, '250') ON CONFLICT (date) DO UPDATE SET first_hour = EXCLUDED.first_hour, last_hour = EXCLUDED.last_hour, quantity = EXCLUDED.quantity;") But what would be the syntax if I want to insert the values from the DataFrame? — Pythoneer
– Pythoneer, Commented Nov 14, 2022 at 17:15
Don't know Python / Pandas well enough. This related question should help: stackoverflow.com/q/61366664/939860 — Erwin Brandstetter
– Erwin Brandstetter, Commented Nov 14, 2022 at 18:18
I have the same question. It seems to me that the 'update' a row function does not exist for pandas to_sql (it drops the whole table or appends to it). So the only option is to do this directly on SQL? — edgarbc
– edgarbc, Commented Feb 17, 2023 at 15:37

ela · Accepted Answer · 2022-11-22 18:28:49Z

1

you could convert the my_table dataframe (which holds new values you'd like to send to the table in the database) to a numpy record array and add it to the query used in the execute function in your comment ^:

values = str(list(my_table.to_records(index=False)))[1:-1]

conn.execute(f"INSERT INTO wrschema.table (date, first_hour, last_hour, quantity) VALUES {values} ON CONFLICT (date) DO UPDATE SET first_hour = EXCLUDED.first_hour, last_hour = EXCLUDED.last_hour, quantity = EXCLUDED.quantity;")

(this is something that worked for me, hope it helps!)

answered Nov 22, 2022 at 18:28

ela

213 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Insert or update if primary key exists into postgreSQL table with .to_sql()

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related