Pandas to_sql with sqlAlchemy duplicate entries error in mysqldb

Question

I am using PANDAS with a SQLAlchemy to write to MYSQL DB using DataFrame.to_sql. I like to turn on the flag for 'append' --> df.to_sql(con=con, name='tablename', if_exists='append') Since the program does several small writes to the tables during the day, I don't want the entire table overwritten with replace. Periodically, I get the duplicate entry error:

sqla: valuesToCalc has error:  (IntegrityError) (1062, "Duplicate entry 
 '0-0000-00-00-00:00:00' for key 'PRIMARY'") 'INSERT INTO valuesToCalc () VALUES ()' ()

Any way to add the syntax "on duplicate key update" to a pd.to_sql ? Do I have to stop using to_sql and go directly with sqlAlchemy? I was hoping not to.

Nidal · Accepted Answer · 2015-04-13 20:12:25Z

13

Not sure if you found an answer but here's a workaround that worked for me:

call the .to_sql() on a temporary table then use a query to update the main table with the temp table. Then you can drop the temp table. So for example:

df.to_sql(con=con, name='tablename_temp', if_exists='replace')
connection = con.connect()
connection.execute(text("INSERT INTO tablename SELECT * FROM tablename_temp ON DUPLICATE KEY UPDATE tablename.field_to_update=tablename_temp.field_to_update"))
connection.execute(text('DROP TABLE tablename_temp '))

answered Apr 13, 2015 at 20:12

Nidal

4153 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

NFern · Accepted Answer · 2016-03-09 02:15:25Z

7

Here is what I ended up doing:

    #df is a dataframe
    num_rows = len(df)
    #Iterate one row at a time
    for i in range(num_rows):
        try:
            #Try inserting the row
            df.iloc[i:i+1].to_sql(name="Table_Name",con = Engine_Name,if_exists = 'append',index=False)
        except IntegrityError:
            #Ignore duplicates
            pass

edited Mar 9, 2016 at 2:15

answered Mar 9, 2016 at 1:59

NFern

2,04621 silver badges20 bronze badges

1 Comment

flgn Over a year ago

This is quite inefficient btw, would not recommend.

Daniel Moya · Accepted Answer · 2024-12-16 20:52:25Z

0

I know this post is 10 years old but it is the first in all my searches and I finally found a, I think, better solution:

from sqlalchemy.dialects.mysql import insert

def insert_ignore(table, conn, keys, data_iter):
    df = pd.DataFrame(data_iter,columns=keys)
    insert_stmt = insert(table.table).values(df)
    ignore_stmt = insert_stmt.prefix_with('IGNORE')
    conn.execute(ignore_stmt)

Then, call it with method= in df.to_sql

df.to_sql("sometable",connection,if_exists="append",index=False, 
method=insert_ignore)

answered Dec 16, 2024 at 20:52

Daniel Moya

11 bronze badge

Collectives™ on Stack Overflow

Pandas to_sql with sqlAlchemy duplicate entries error in mysqldb

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related