10

I am using PANDAS with a SQLAlchemy to write to MYSQL DB using DataFrame.to_sql. I like to turn on the flag for 'append' --> df.to_sql(con=con, name='tablename', if_exists='append') Since the program does several small writes to the tables during the day, I don't want the entire table overwritten with replace. Periodically, I get the duplicate entry error:

sqla: valuesToCalc has error:  (IntegrityError) (1062, "Duplicate entry 
 '0-0000-00-00-00:00:00' for key 'PRIMARY'") 'INSERT INTO valuesToCalc () VALUES ()' ()

Any way to add the syntax "on duplicate key update" to a pd.to_sql ? Do I have to stop using to_sql and go directly with sqlAlchemy? I was hoping not to.

3 Answers 3

13

Not sure if you found an answer but here's a workaround that worked for me:

call the .to_sql() on a temporary table then use a query to update the main table with the temp table. Then you can drop the temp table. So for example:

df.to_sql(con=con, name='tablename_temp', if_exists='replace')
connection = con.connect()
connection.execute(text("INSERT INTO tablename SELECT * FROM tablename_temp ON DUPLICATE KEY UPDATE tablename.field_to_update=tablename_temp.field_to_update"))
connection.execute(text('DROP TABLE tablename_temp '))
Sign up to request clarification or add additional context in comments.

Comments

7

Here is what I ended up doing:

    #df is a dataframe
    num_rows = len(df)
    #Iterate one row at a time
    for i in range(num_rows):
        try:
            #Try inserting the row
            df.iloc[i:i+1].to_sql(name="Table_Name",con = Engine_Name,if_exists = 'append',index=False)
        except IntegrityError:
            #Ignore duplicates
            pass

1 Comment

This is quite inefficient btw, would not recommend.
0

I know this post is 10 years old but it is the first in all my searches and I finally found a, I think, better solution:

from sqlalchemy.dialects.mysql import insert

def insert_ignore(table, conn, keys, data_iter):
    df = pd.DataFrame(data_iter,columns=keys)
    insert_stmt = insert(table.table).values(df)
    ignore_stmt = insert_stmt.prefix_with('IGNORE')
    conn.execute(ignore_stmt)

Then, call it with method= in df.to_sql

df.to_sql("sometable",connection,if_exists="append",index=False, 
method=insert_ignore)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.