Unable to send pandas dataframe object to SQL when using multiprocessing module

Question

I am transforming a single dataframe using multiple cpu cores and want to insert the result into MySQL.

Using the below code I observed only one active cpu core and no updates to MySQL. No error messages were produced.

The original dataframe pandas_df is never altered. All transformations to pandas_df are stored in result_df.

The code has been verified to work correctly in serial.

import multiprocessing as mp
from sqlalchemy import create_engine
engine = create_engine(MYSQL_STRING)

def function(pandas_df, tuple, engine):
    #slice and dice pandas_df according to tuple
    result_df.to_sql("TABLE_NAME", engine, if_exists='append')


pool = mp.Pool(processes=4)
for tuple in tuples:
    pool.apply_async(est, args=(pandas_df, tuple, engine))

Most of the tutorials and guides I came across only passed strings inside args=(). There are however articles that do demonstrate the ability to pass numpy arrays: http://sebastianraschka.com/Articles/2014_multiprocessing_intro.html

I have also tried the above code using the map_async() method and/or inserting a return statement inside the function and there was no difference in behaviour.

I am open to trying different python modules. I need a solution which transforms a single dataframe in parallel and inserts the result into a database.

user1478046 · Accepted Answer · 2016-09-15 15:49:33Z

1

You need to make sure that the function has access to all the variables otherwise a silent failure may occur.

answered Sep 15, 2016 at 15:49

user1478046

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Unable to send pandas dataframe object to SQL when using multiprocessing module

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related