4

I am using multiprocessing, and generating a pandas DataFrame with each process. I would like to merge them together and output the data. The following strategy seems almost work, but when trying to read in the data with df.read_csv() it only uses the first name as a column header.

from multiprocessing import Process, Lock

def foo(name, lock):
    d = {f'{name}': [1, 2]}
    df = pd.DataFrame(data=d)

    lock.acquire()
    try:
        df.to_csv('output.txt', mode='a')
    finally:
        lock.release()

if __name__ == '__main__':
    lock = Lock()

    for name in ['bob','steve']
        p = Process(target=foo, args=(name, lock))
        p.start()
    p.join()
2
  • 1
    Were you expecting the columns to be concatenated horizontally? CSV files don't do that. You might consider using a multiprocessing.Queue to pass your end result back to the originating process, and leave the master process in charge of combining things. Commented Oct 22, 2021 at 20:20
  • @TimRoberts that is a great solution, then i can just combine the dataframes and write out at the same time, makes sense. Commented Oct 22, 2021 at 20:22

1 Answer 1

6

You can use multiprocessing.Pool:

import multiprocessing
import pandas as pd

def foo(name):
    d = {f'{name}': [1, 2]}
    df = pd.DataFrame(data=d)
    return df

if __name__ == '__main__':
    data = ['bob', 'steve']
    with multiprocessing.Pool(2) as pool:
        data = pool.map(foo, data)
    pd.concat(data, axis=1).to_csv('output.csv')

Output:

>>> pd.concat(data, axis=1)
   bob  steve
0    1      1
1    2      2
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.