0

I have some code that makes an API call, formats the data, and appends it to a csv. Due to concerns about thread safety, I store all rows in a list before writing to the csv.

results = [] # list of lists, to be each list is a row for csv
with futures.ThreadPoolExecutor(max_workers=64) as executor:
    for data in executor.map(get_data, data_units):
        extract_data(data)
# write results to csv

def get_data(data_unit):
     # makes api call to get data for data_unit
     return data


def extract_data(data, results):
    # turns data returned from api call into a list, and appends to results
    row = formatted_data
    results.append(row)

Is there a more canonical/faster way to do this? I have looked at the answer here Multiple threads writing to the same CSV in Python, and I don't want to put a lock in extract_data to write because it would slow down the API calls due to causing a bottleneck for the threads to write. For example is there another data structure I could use instead of the results list (something like a threadsafe stack) that I could pop stuff off to write to csv, while stuff keeps getting added to it?

1

1 Answer 1

0

No matter which structure you will use to replace your list, it will mandatory use locks internally. You can use a queue for example, which is thread-safe, but it uses a lock internally.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.