Python multiprocessing error when passing a list

Question

I am trying to use multiprocessing library to speed up CSV reading from files. I've done so using Pool and now I'm trying to do it with Process(). However when running the code, it's giving me the following error:

AttributeError: 'tuple' object has no attribute 'join'

Can someone tell me what's wrong? I don't understand the error.

import glob
import pandas as pd
from multiprocessing import Process
import matplotlib.pyplot as plt
import os

location = "/home/data/csv/"

uber_data = []

def read_csv(filename):

    return uber_data.append(pd.read_csv(filename))

def data_wrangling(uber_data):
    uber_data['Date/Time'] = pd.to_datetime(uber_data['Date/Time'], format="%m/%d/%Y %H:%M:%S")
    uber_data['Dia Setmana'] = uber_data['Date/Time'].dt.weekday_name
    uber_data['Num dia'] = uber_data['Date/Time'].dt.dayofweek

    return uber_data

def plotting(uber_data):

    weekdays = uber_data.pivot_table(index=['Num dia','Dia Setmana'], values='Base', aggfunc='count')
    weekdays.plot(kind='bar', figsize=(8,6))
    plt.ylabel('Total Journeys')
    plt.title('Journey on Week Day')

def main():

    processes = []
    files = list(glob.glob(os.path.join(location,'*.csv*')))

    for i in files:
        p = Process(target=read_csv, args=[i])
        processes.append(p)
        p.start()

    for process in enumerate(processes):
        process.join()


    #combined_df = pd.concat(df_list, ignore_index=True)
    #dades_mod = data_wrangling(combined_df)
    #plotting(dades_mod)

main()

Thank you.

Neil · Accepted Answer · 2020-05-23 13:50:46Z

1

I'm not 100% sure how Process works in this context, but what you have written here:

for process in enumerate(processes):
    process.join()

will obviously throw an error and you can see this just from knowing builtins. Calling enumerate on any iterable will produce a tuple where the first element is a counter.

Try this for a start:

for i, process in enumerate(processes): # assign the counter to the variable i, and grab the process which is the second element of the tuple
    process.join()

Or this:

for process in processes:
    process.join()

For more on enumerate see the builtin documentation here: https://docs.python.org/3/library/functions.html#enumerate

answered May 23, 2020 at 13:50

Neil

3,3217 gold badges30 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

js352 Over a year ago

Yep, thank you that worked. However now it tells me that there are no objects to concatenate. Since I'm passing args=[i], should not the read_csv file function get it?

Neil Over a year ago

Sounds to me like you have a new problem that deserves a new question. Please accept this answer if it answered the question that you asked.

Collectives™ on Stack Overflow

Python multiprocessing error when passing a list

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related