Parallelize this nested for loop in python

Question

I'm struggling again to improve the execution time of this piece of code. Since the calculations are really time-consuming I think that the best solution would be to parallelize the code.

I was first working with maps as explained in this question, but then I tried a more simple approach thinking that I could find a better solution. However I couldn't come up with anything yet, so since it's a different problem I decided to post it as a new question.

I am working on a Windows platform, using Python 3.4.

Here's the code:

similarity_matrix = [[0 for x in range(word_count)] for x in range(word_count)]
for i in range(0, word_count):
    for j in range(0, word_count):
        if i > j:
            similarity = calculate_similarity(t_matrix[i], t_matrix[j])
            similarity_matrix[i][j] = similarity
            similarity_matrix[j][i] = similarity

This is the calculate_similarity function:

def calculate_similarity(array_word1, array_word2):
      denominator = sum([array_word1[i] + array_word2[i] for i in range(word_count)])
      if denominator == 0:
          return 0
      numerator = sum([2 * min(array_word1[i], array_word2[i]) for i in range(word_count)])
      return numerator / denominator

And the explanation for the code:

word_count is the total number of unique words stored in a list
t_matrix is a matrix containing a value for each pair of words
the output should be similarity_matrix whose dimension is word_count x word_count also containing a similarity value for each pair of words
it's ok to keep both matrices in memory
after these computations I can easily find the most similar word for each words (or the top three similar words, as the task may require)
calculate_similarity takes two float lists, each for a separate word (each is a row in the t_matrix)

I work with a list of 13k words, and if I calculated correctly the execution time on my system would be a few days. So, anything that will do the job in one day would be wonderful!

Maybe only parellelizing the calculation of numerator and denominator in calculate_similarity would make a significant improvement.

as a matter of style, you could iterate the 'triangle' instead of the 'square' by changing the range in the second loop to be bounded by the i of the first loop. you won't get much performance boost this way, but you will reduce one level of nesting .. :) — wim
– wim, Commented Mar 24, 2015 at 0:51
@wim you mean for j in range(i, word_count): ? I already tried that but it changed almost nothing. @Coder Hacker If I really have no other option left, I would translate the code I we written so far. You think it would be easy to do it in C++? — van
– van, Commented Mar 24, 2015 at 0:57
@jmunsch: Your code is faster because it doesn't actually allocate as many lists. If you add a value to one of the inner lists, you'll see it in all of the copies too, which is probably not desirable. — Blckknght
– Blckknght, Commented Mar 24, 2015 at 5:13
Take a look at @Blckknght's solution, it is a big improvement over mine. Also, you should remove the square brackets from the sums in calculate_similarity. For example denominator = sum(array_word1[i] + array_word2[i] for i in range(word_count)). Using a generator here, instead of a list comprehension, saves you from constructing a list and storing a lot of values just to sum them up an throw the list away. — Matt
– Matt, Commented Mar 24, 2015 at 6:04

Community · Accepted Answer · 2017-05-23 12:09:28Z

6

Here's an alternative implementation of the same general algorithm as in Matt's answer, just using multiprocessing.Pool instead of concurrent.futures.ProcessPoolExecutor. It may be more efficient than his code, since the values of the input (t_matrix) are only serialized once and passed to the initializer function in each worker process.

import multiprocessing
import itertools

def worker_init(matrix):
    global worker_matrix
    worker_matrix = matrix

def worker(i, j):
    similarity = calculate_similarity(worker_matrix[i], worker_matrix[j])
    return i, j, similarity

def main(matrix):
    size = len(matrix)
    result = [[0]*size for _ in range(size)]
    with multiprocessing.Pool(initializer=worker_init, initargs=(matrix,)) as pool:
        for i, j, val in pool.starmap(worker, itertools.combinations(range(size), 2)):
            result[i][j] = result[j][i] = val
    return result

if __name__ == "__main__":
    # get t_matrix from somewhere
    main(t_matrix)

edited May 23, 2017 at 12:09

CommunityBot

11 silver badge

answered Mar 24, 2015 at 5:33

Blckknght

106k11 gold badges135 silver badges188 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Matt Over a year ago

result = [[0]*size for _ in size] should be result = [[0]*size for _ in range(size)]

Blckknght Over a year ago

@Matt: Thanks, you're right. I factored out the len(matrix) computation late in the coding process and ended up cutting too much in that spot.

van Over a year ago

This solution causes a memory error when computing the result. I'm not sure what can be the difference in memory.

Matt · Accepted Answer · 2015-03-24 06:10:08Z

3

from concurrent.futures import ProcessPoolExecutor, Future, wait
from itertools import combinations
from functools import partial

similarity_matrix = [[0]*word_count for _ in range(word_count)]

def callback(i, j, future):
    similarity_matrix[i][j] = future.result()
    similarity_matrix[j][i] = future.result()

with ProcessPoolExecutor(max_workers=4) as executer:
    fs = []
    for i, j in combinations(range(wordcount), 2):
        future = excuter.submit(
                    calculate_similarity, 
                    t_matrix[i], 
                    t_matrix[j])

        future.add_done_callback(partial(callback, i, j))
        fs.append(future)

    wait(fs)

edited Mar 24, 2015 at 6:10

answered Mar 24, 2015 at 2:13

Matt

7545 silver badges12 bronze badges

3 Comments

van Over a year ago

The code works fine, but again I tested it with 350 words, but the execution time increased from 18 to 40 seconds. I also tried using ThreadPoolExecutor, but as I can see in the Task Manager it doesn't seem like it runs in parallel and the execution time is about 25 seconds.

Blckknght Over a year ago

I'd guess that the slowdown is due to overhead serializing the t_matrix[i] and t_matrix[j] items over and over. I know that with multiprocessing.Pool you can write a function to do one-time setup for the worker processes, but I'm not sure if you can do the same for a ProcessPoolExecutor.

dano Over a year ago

@Blckknght concurrent.futures.ProcessPoolExecutor doesn't support the initializer keyword argument right now. There is a bug filed against this, with a working patch to add support for it, but it's still waiting to be reviewed.

plonser · Accepted Answer · 2015-03-24 09:03:15Z

2

You are using to many list comprehensions for such an amount of data. I would strongly recommend the numpy module. If that is an option you can do:

import numpy as np
import itertools

t = np.array(t_matrix)

s = np.sum(t,axis=1)

denom = s[:,None] + s[None,:]
num = np.zeros((word_count,word_count))

for i,j in itertools.product(range(word_count),repeat=2):
    num[i,j] = np.where(t[i] <= t[j], t[i], t[j]).sum()

similarity_matrix = np.where(denom != 0.0, 2.*num/denom, 0 )

answered Mar 24, 2015 at 9:03

plonser

3,3632 gold badges21 silver badges23 bronze badges

1 Comment

plonser Over a year ago

There are still ways to improve that ... but first I would be interested whether it gives the same result as your code and how fast it is

Collectives™ on Stack Overflow

Parallelize this nested for loop in python

3 Answers 3

3 Comments

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related