REPLACE rows in mysql database table with pandas DataFrame

Question

Python Version - 2.7.6

Pandas Version - 0.17.1

MySQLdb Version - 1.2.5

In my database ( PRODUCT ) , I have a table ( XML_FEED ). The table XML_FEED is huge ( Millions of record ) I have a pandas.DataFrame() ( PROCESSED_DF ). The dataframe has thousands of rows.

Now I need to run this

REPLACE INTO TABLE PRODUCT.XML_FEED
(COL1, COL2, COL3, COL4, COL5),
VALUES (PROCESSED_DF.values)

Question:-

Is there a way to run REPLACE INTO TABLE in pandas? I already checked pandas.DataFrame.to_sql() but that is not what I need. I do not prefer to read XML_FEED table in pandas because it very huge.

devnull · Accepted Answer · 2019-05-22 10:00:01Z

18

With the release of pandas 0.24.0, there is now an official way to achieve this by passing a custom insert method to the to_sql function.

I was able to achieve the behavior of REPLACE INTO by passing this callable to to_sql:

def mysql_replace_into(table, conn, keys, data_iter):
    from sqlalchemy.dialects.mysql import insert
    from sqlalchemy.ext.compiler import compiles
    from sqlalchemy.sql.expression import Insert

    @compiles(Insert)
    def replace_string(insert, compiler, **kw):
        s = compiler.visit_insert(insert, **kw)
        s = s.replace("INSERT INTO", "REPLACE INTO")
        return s

    data = [dict(zip(keys, row)) for row in data_iter]

    conn.execute(table.table.insert(replace_string=""), data)

You would pass it like so:

df.to_sql(db, if_exists='append', method=mysql_replace_into)

Alternatively, if you want the behavior of INSERT ... ON DUPLICATE KEY UPDATE ... instead, you can use this:

def mysql_replace_into(table, conn, keys, data_iter):
    from sqlalchemy.dialects.mysql import insert

    data = [dict(zip(keys, row)) for row in data_iter]

    stmt = insert(table.table).values(data)
    update_stmt = stmt.on_duplicate_key_update(**dict(zip(stmt.inserted.keys(), 
                                               stmt.inserted.values())))

    conn.execute(update_stmt)

Credits to https://stackoverflow.com/a/11762400/1919794 for the compile method.

answered May 22, 2019 at 10:00

devnull

4836 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

UglyBob Over a year ago

Thank you, this works exactly how I want. Just wish pandas could add a some option for this as well...

Murchak Over a year ago

Thanks devull. This solution used to work for me just fine, except when I updated my system and installed the latest Python, Pandas, SQLAlchemy, etc. Now, I get the following error: TypeError: TableClause.insert() got an unexpected keyword argument 'replace_string

Yogesh Yadav · Accepted Answer · 2016-01-15 01:21:14Z

Till this version (0.17.1) I am unable find any direct way to do this in pandas. I reported a feature request for the same. I did this in my project with executing some queries using MySQLdb and then using DataFrame.to_sql(if_exists='append')

Suppose

1) product_id is my primary key in table PRODUCT

2) feed_id is my primary key in table XML_FEED.

SIMPLE VERSION

import MySQLdb
import sqlalchemy
import pandas

con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password@localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated

Please note:- The REPLACE [INTO] syntax allows us to INSERT a row into a table, except that if a UNIQUE KEY (including PRIMARY KEY) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.

dbc · Accepted Answer · 2020-08-27 21:15:21Z

2

I needed a generic solution to this problem, so I built on shiva's answer--maybe it will be helpful to others. This is useful in situations where you grab a table from a MySQL database (whole or filtered), update/add some rows, and want to perform a REPLACE INTO statement with df.to_sql().

It finds the table's primary keys, performs a delete statement on the MySQL table with all keys from the pandas dataframe, and then inserts the dataframe into the MySQL table.

def to_sql_update(df, engine, schema, table):
    df.reset_index(inplace=True)
    sql = ''' SELECT column_name from information_schema.columns
              WHERE table_schema = '{schema}' AND table_name = '{table}' AND
                    COLUMN_KEY = 'PRI';
          '''.format(schema=schema, table=table)
    id_cols = [x[0] for x in engine.execute(sql).fetchall()]
    id_vals = [df[col_name].tolist() for col_name in id_cols]
    sql = ''' DELETE FROM {schema}.{table} WHERE 0 '''.format(schema=schema, table=table)
    for row in zip(*id_vals):
        sql_row = ' AND '.join([''' {}='{}' '''.format(n, v) for n, v in zip(id_cols, row)])
        sql += ' OR ({}) '.format(sql_row)
    engine.execute(sql)
    
    df.to_sql(table, engine, schema=schema, if_exists='append', index=False)

edited Aug 27, 2020 at 21:15

answered Sep 30, 2016 at 16:21

dbc

7279 silver badges24 bronze badges

4 Comments

Ricky McMaster Over a year ago

This works great, thank you. However I removed line 2 because I don't think it's required, and with it you are left with an extra column 'index' which will of course cause an error - unless you meant to add df.drop(['index'], axis=1, inplace=True).

dbc Over a year ago

That's a good point; the second line is only needed if the df has an index set on one or more columns.

Sachin Over a year ago

I am not able to understand about name variable ? can you help me . df.to_sql(name, engine, schema=schema, if_exists='append', index=False)

dbc Over a year ago

That was a typo, it should be df.to_sql(table, engine ...). I fixed it in the answer.

Imran Malek · Accepted Answer · 2016-01-07 18:13:04Z

-6

If you use to_sql you should be able to define it so that you replace values if they exist, so for a table named 'mydb' and a dataframe named 'df', you'd use:

df.to_sql(mydb,if_exists='replace')

That should replace values if they already exist, but I am not 100% sure if that's what you're looking for.

answered Jan 7, 2016 at 18:13

Imran Malek

452 bronze badges

2 Comments

Yogesh Yadav Over a year ago

if_exist works for table not for rows in table. if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’ fail: If table exists, do nothing. replace: If table exists, drop it, recreate it, and insert data. append: If table exists, insert data. Create if does not exist

Yohan Obadia Over a year ago

His answer is still valid as long as it is mentioned that it replace with the whole dataframe. If his command is preceded by a filter on the df then it is an appropriate one.

Collectives™ on Stack Overflow

REPLACE rows in mysql database table with pandas DataFrame

4 Answers 4

2 Comments

Comments

4 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related