1

I would like to upsert my pandas DataFrame into a SQL Server table. This question has a workable solution for PostgreSQL, but T-SQL does not have an ON CONFLICT variant of INSERT. How can I accomplish the same thing for SQL Server?

1 Answer 1

5

Update, July 2022: You can save some typing by using this function to build the MERGE statement and perform the upsert for you.


SQL Server offers the MERGE statement:

import pandas as pd
import sqlalchemy as sa

connection_string = (
    "Driver=ODBC Driver 17 for SQL Server;"
    "Server=192.168.0.199;"
    "UID=scott;PWD=tiger^5HHH;"
    "Database=test;"
    "UseFMTONLY=Yes;"
)
connection_url = sa.engine.URL.create(
    "mssql+pyodbc",
    query={"odbc_connect": connection_string}
)

engine = sa.create_engine(connection_url, fast_executemany=True)

with engine.begin() as conn:
    # step 0.0 - create test environment
    conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
    conn.exec_driver_sql(
        "CREATE TABLE main_table (id int primary key, txt varchar(50))"
    )
    conn.exec_driver_sql(
        "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
    )
    # step 0.1 - create DataFrame to UPSERT
    df = pd.DataFrame(
        [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
    )

    # step 1 - upload DataFrame to temporary table
    df.to_sql("#temp_table", conn, index=False, if_exists="replace")

    # step 2 - merge temp_table into main_table
    conn.exec_driver_sql(
        """\
        MERGE main_table WITH (HOLDLOCK) AS main
        USING (SELECT id, txt FROM #temp_table) AS temp
        ON (main.id = temp.id)
        WHEN MATCHED THEN
            UPDATE SET txt = temp.txt
        WHEN NOT MATCHED THEN
            INSERT (id, txt) VALUES (temp.id, temp.txt);
        """
    )

    # step 3 - confirm results
    result = conn.exec_driver_sql(
        "SELECT * FROM main_table ORDER BY id"
    ).fetchall()
    print(result)  
    # [(1, 'row 1 new text'), (2, 'new row 2 text')]
Sign up to request clarification or add additional context in comments.

6 Comments

For an example that can be used with a compound (multi-column) primary key see this answer.
I'm trying to replicate step 1 in my current use case: I'm creating the sqlalchemy engine like so: sa.create_engine("ibm_db_sa+pyodbc://?driver=IBM i Access ODBC Driver&SYSTEM=XXX&;Port=21&UID=XXX&PWD=XXX&Database=") Then executing step 1: df1.to_sql("WWNEXPORT.TEMP", engine, index=False, if_exists="replace") But I receive the following error: sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42S02', '[42S02] [IBM][System i Access ODBC Driver][DB2 for i5/OS]SQL0204 - TABLES of type *FILE in SYSCAT not found. (-204) (SQLPrepare)') Do you maybe know why?
@TheDude - Maybe try df1.to_sql("TEMP", engine, schema="WWNEXPORT", index=False, if_exists="replace")
Unfortunately, this isn't the solution and I get the same error. This is some additional information that comes with the error and that I couldn't post in the comment above due to the limitation of characters for comments: [SQL: SELECT "SYSCAT"."TABLES"."TABNAME" FROM "SYSCAT"."TABLES" WHERE "SYSCAT"."TABLES"."TABSCHEMA" = ? AND "SYSCAT"."TABLES"."TABNAME" = ?] [parameters: ('WWNEXPORT', 'TEMP')] (Background on this error at: https://sqlalche.me/e/14/f405) I don't understand how and why this SQL statement is generated.
@TheDude - pandas to_sql() is calling SQLAlchemy has_table() to see if the table already exists, so SQLAlchemy is querying the SYSCAT (metadata) tables to see if your table shows up there. I have no experience with ibm_db_sa, unfortunately.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.