2

I am transforming a single dataframe using multiple cpu cores and want to insert the result into MySQL.

Using the below code I observed only one active cpu core and no updates to MySQL. No error messages were produced.

The original dataframe pandas_df is never altered. All transformations to pandas_df are stored in result_df.

The code has been verified to work correctly in serial.

import multiprocessing as mp
from sqlalchemy import create_engine
engine = create_engine(MYSQL_STRING)

def function(pandas_df, tuple, engine):
    #slice and dice pandas_df according to tuple
    result_df.to_sql("TABLE_NAME", engine, if_exists='append')


pool = mp.Pool(processes=4)
for tuple in tuples:
    pool.apply_async(est, args=(pandas_df, tuple, engine))

Most of the tutorials and guides I came across only passed strings inside args=(). There are however articles that do demonstrate the ability to pass numpy arrays: http://sebastianraschka.com/Articles/2014_multiprocessing_intro.html

I have also tried the above code using the map_async() method and/or inserting a return statement inside the function and there was no difference in behaviour.

I am open to trying different python modules. I need a solution which transforms a single dataframe in parallel and inserts the result into a database.

1 Answer 1

1

You need to make sure that the function has access to all the variables otherwise a silent failure may occur.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.