I have read this blog which shows how an algorithm had a 250x speed-up by using numpy. I have tried to improve the following code by using numpy but I couldn't make it work:
for i in nodes[1:]:
for lb in range(2, diameter+1):
not_valid_colors = set()
valid_colors = set()
for j in nodes:
if j == i:
break
if distances[i-1, j-1] >= lb:
not_valid_colors.add(c[j, lb])
else:
valid_colors.add(c[j, lb])
c[i, lb] = choose_color(not_valid_colors, valid_colors)
return c
Explanation
The code above is part of an algorithm used to calculate the self similar dimension of a graph. It works basically by constructing dual graphs G' where a node is connected to each other node if the distance between them is greater or equals to a given value (Lb) and then compute the graph coloring on those dual networks.
The algorithm description is the following:
- Assign a unique id from 1 to N to all network nodes, without assigning any colors yet.
- For all Lb values, assign a color value 0 to the node with id=1, i.e. C_1l = 0.
- Set the id value i = 2. Repeat the following until i = N.
- a) Calculate the distance l_ij from i to all the nodes in the network with id j less than i.
- b) Set Lb = 1
- c) Select one of the unused colors C[ j][l_ij] from all nodes j < i for which l_ij ≥ Lb . This is the color C[i][Lb] of node i for the given Lb value.
- d) Increase Lb by one and repeat (c) until Lb = Lb_max.
- e) Increase i by 1.
I wrote it in python but it takes more than a minute when try to use it with small networks which have 100 nodes and p=0.9.
As I'm still new to python and numpy I did not find the way to improve its efficiency.
Is it possible to remove the loops by using the numpy.where to find where the paths are longer than the given Lb? I tried to implement it but didn't work...
numpyknowledgeable posters on SO than on CR. CR also tends to be very picky about question format. 'vectorizing' is a common topic on SOnumpy.numpycan speed things up a lot IF your problem is essential parallel in nature - do the same thing for a elements, regardless of order. But if the problem is serial - the action of the i'th element depends on prior action on the 'i-1', thennumpyusually does not help.nodesanddistancesand your attempt to vectorize with numpy.for j in nodes[:i]is more pythonic than an explicitif ... then break.