Appending a csv/list while multithreading in Python

Question

I have some code that makes an API call, formats the data, and appends it to a csv. Due to concerns about thread safety, I store all rows in a list before writing to the csv.

results = [] # list of lists, to be each list is a row for csv
with futures.ThreadPoolExecutor(max_workers=64) as executor:
    for data in executor.map(get_data, data_units):
        extract_data(data)
# write results to csv

def get_data(data_unit):
     # makes api call to get data for data_unit
     return data


def extract_data(data, results):
    # turns data returned from api call into a list, and appends to results
    row = formatted_data
    results.append(row)

Is there a more canonical/faster way to do this? I have looked at the answer here Multiple threads writing to the same CSV in Python, and I don't want to put a lock in extract_data to write because it would slow down the API calls due to causing a bottleneck for the threads to write. For example is there another data structure I could use instead of the results list (something like a threadsafe stack) that I could pop stuff off to write to csv, while stuff keeps getting added to it?

Use a queue? docs.python.org/2/library/queue.html#module-Queue — Alastair McCormack
– Alastair McCormack, Commented Mar 18, 2016 at 18:49

Thibaut D. · Accepted Answer · 2016-03-18 18:48:39Z

0

No matter which structure you will use to replace your list, it will mandatory use locks internally. You can use a queue for example, which is thread-safe, but it uses a lock internally.

answered Mar 18, 2016 at 18:48

Thibaut D.

2,6936 gold badges25 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Appending a csv/list while multithreading in Python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related