2

I have a pandas DataFrame that consists of multiple columns that I want to store into the postgreSQL database, using .to_sql():

my_table.to_sql('table', con=engine, schema='wrhouse', if_exists='append', index=False)

I have set a primary key (date), in order to avoid duplicate entries. So above-mentioned command works when my primary key does not exist in the database.

However, if that key exists I am getting the following error:

IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "table_pkey"
DETAIL:  Key (date)=(2022-07-01 00:00:00) already exists.

Now, what I would like to do is:

  • Update the row with the already existed Key(date)
  • Insert a new row in case the Key(date) does not exist

I checked the documentation on: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html but I could't find any option by using the DataFrame.to_sql() function.

Additionally, if I change the if_exists='append' parameter to if_exists='replace', it deletes the whole table and that is not what I want.

Is there any way to update/insert rows using the .to_sql() function?

4
  • 1
    Use an UPSERT (INSERT .. ON CONFLICT ... DO UPDATE). Not sure how to dumb that down for your ORM. (People should just use SQL instead of the 1000 different, mostly half-broken obfuscation layers.) Commented Nov 14, 2022 at 14:01
  • @ErwinBrandstetter This line of code works, when I am explicitly typing the VALUES: engine = conn.execute("INSERT INTO wrschema.table (date, first_hour, last_hour, quantity) VALUES ('2022-07-01 00:00:00', 15, 17, '250') ON CONFLICT (date) DO UPDATE SET first_hour = EXCLUDED.first_hour, last_hour = EXCLUDED.last_hour, quantity = EXCLUDED.quantity;") But what would be the syntax if I want to insert the values from the DataFrame? Commented Nov 14, 2022 at 17:15
  • 1
    Don't know Python / Pandas well enough. This related question should help: stackoverflow.com/q/61366664/939860 Commented Nov 14, 2022 at 18:18
  • I have the same question. It seems to me that the 'update' a row function does not exist for pandas to_sql (it drops the whole table or appends to it). So the only option is to do this directly on SQL? Commented Feb 17, 2023 at 15:37

1 Answer 1

1

you could convert the my_table dataframe (which holds new values you'd like to send to the table in the database) to a numpy record array and add it to the query used in the execute function in your comment ^:

values = str(list(my_table.to_records(index=False)))[1:-1]

conn.execute(f"INSERT INTO wrschema.table (date, first_hour, last_hour, quantity) VALUES {values} ON CONFLICT (date) DO UPDATE SET first_hour = EXCLUDED.first_hour, last_hour = EXCLUDED.last_hour, quantity = EXCLUDED.quantity;")

(this is something that worked for me, hope it helps!)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.