2

According to this answer when multiprocessing with multiple arguments starmap should be used. The problem I am having is that one of my arguments is a constant dataframe. When I create a list of arguments to be used by my function and starmap the dataframe gets stored over and over. I though I could get around this problem using namespace, but can't seem to figure it out. My code below hasn't thrown an error, but after 30 minutes no files have written. The code runs in under 10 minutes without using multiprocessing and just calling write_file directly.

import pandas as pd
import numpy as np
import multiprocessing as mp

def write_file(df, colIndex, splitter, outpath):
    with open(outpath + splitter + ".txt", 'a') as oFile:
        data = df[df.iloc[:,colIndex] == splitter]
        data.to_csv(oFile, sep = '|', index = False, header = False)

mgr = mp.Manager()
ns = mgr.Namespace()
df = pd.read_table(file_, delimiter = '|', header = None)
ns.df = df.iloc[:,1] = df.iloc[:,1].astype(str)
fileList = list(df.iloc[:, 1].astype('str').unique())
for item in fileList:
    with mp.Pool(processes=3) as pool:
        pool.starmap(write_file, np.array((ns, 1, item, outpath)).tolist())
2
  • Did you find a solution? Commented Feb 14, 2018 at 11:03
  • see answer below Commented Feb 23, 2018 at 16:07

2 Answers 2

1

To anyone else struggling with this issue, my solution was to create an iterable list of tuples of length chunksize out of the dataframe via:

iterable = product(np.array_split(data, 15), [args])

Then, pass this iterable to the starmap:

pool.starmap(func, iterable)
Sign up to request clarification or add additional context in comments.

1 Comment

What is [args] in this context?
0

I had the same issue - needed to pass two existing dataframes to the function using starmap. It turns out that there isn't a need to declare a dataframe as an argument in the function at all. You could just call the dataframe using 'global', as described in the accepted answer here: Pandas: local vs global dataframe in functions

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.