0

I am trying to use multiprocessing library to speed up CSV reading from files. I've done so using Pool and now I'm trying to do it with Process(). However when running the code, it's giving me the following error:

AttributeError: 'tuple' object has no attribute 'join'

Can someone tell me what's wrong? I don't understand the error.

import glob
import pandas as pd
from multiprocessing import Process
import matplotlib.pyplot as plt
import os

location = "/home/data/csv/"

uber_data = []

def read_csv(filename):

    return uber_data.append(pd.read_csv(filename))

def data_wrangling(uber_data):
    uber_data['Date/Time'] = pd.to_datetime(uber_data['Date/Time'], format="%m/%d/%Y %H:%M:%S")
    uber_data['Dia Setmana'] = uber_data['Date/Time'].dt.weekday_name
    uber_data['Num dia'] = uber_data['Date/Time'].dt.dayofweek

    return uber_data

def plotting(uber_data):

    weekdays = uber_data.pivot_table(index=['Num dia','Dia Setmana'], values='Base', aggfunc='count')
    weekdays.plot(kind='bar', figsize=(8,6))
    plt.ylabel('Total Journeys')
    plt.title('Journey on Week Day')

def main():

    processes = []
    files = list(glob.glob(os.path.join(location,'*.csv*')))

    for i in files:
        p = Process(target=read_csv, args=[i])
        processes.append(p)
        p.start()

    for process in enumerate(processes):
        process.join()


    #combined_df = pd.concat(df_list, ignore_index=True)
    #dades_mod = data_wrangling(combined_df)
    #plotting(dades_mod)

main()

Thank you.

1 Answer 1

1

I'm not 100% sure how Process works in this context, but what you have written here:

for process in enumerate(processes):
    process.join()

will obviously throw an error and you can see this just from knowing builtins. Calling enumerate on any iterable will produce a tuple where the first element is a counter.

Try this for a start:

for i, process in enumerate(processes): # assign the counter to the variable i, and grab the process which is the second element of the tuple
    process.join()

Or this:

for process in processes:
    process.join()

For more on enumerate see the builtin documentation here: https://docs.python.org/3/library/functions.html#enumerate

Sign up to request clarification or add additional context in comments.

2 Comments

Yep, thank you that worked. However now it tells me that there are no objects to concatenate. Since I'm passing args=[i], should not the read_csv file function get it?
Sounds to me like you have a new problem that deserves a new question. Please accept this answer if it answered the question that you asked.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.