-1

I'm building a similarity matrix of a list of items. The naive approach is to iterate the list twice, but this needlessly will compare A:B and B:A when they're the same.

for A in items:
   for B in items:
      if A==B: continue
      sim[A][B] = calc_sim(A, B)

is there a simple way to only calculate half of the values? I could put a skip in there like

if sim[B][A]: continue # already calculated in other direction

But still the iteration is happening. Effectively I just want to iterate through the top or bottom half of the grid:

sim matrix

There are some similar Qs, but nothing with a canonical answer. This seems like a basic CS algo question!

4
  • Is this about Python? Commented Jan 2, 2021 at 22:29
  • the above is pseudo-code. i am writing python in this case tho yes but it's more about an algorithm language independent. I know i can use libraries like sklearn for NearestNeighbot but more interested in the raw algo for myself. added a python tag anyway Commented Jan 2, 2021 at 22:33
  • 1
    Use break instead of continue? Commented Jan 3, 2021 at 0:14
  • @superbrain I think you could be right! simplest is best haha. Commented Jan 3, 2021 at 6:08

3 Answers 3

3

You could use itertools.combinations.

import itertools

for a, b in itertools.combinations(items, 2):
    sim[a][b] = sim[b][a] = calc_sim(a, b)
Sign up to request clarification or add additional context in comments.

1 Comment

oh thanks for pointing this out. it's a bit python specific, but itertools is a great toolbox.
0

If you need just a general algorithm to reduce number iterations, you can limit the range of the inner loop

for i, A in enumerate(items):
   for B in items[:i]:
      sim[A][B] = calc_sim(A, B)

But if you are looking for Python-specific optimization, it would be much better to use numpy vectorization. For example, if calc_sim(a, b) computes squared difference between a and b, then it can be vectorized the following way:

import numpy as np

list = [1, 2, 3]
array = np.array(list)
sim = np.square(array[:,np.newaxis] - array)
[[0 1 4]
 [1 0 1]
 [4 1 0]]

2 Comments

it's actually an array of word vectors I'm using to do the comparison. Can np be made to take any comparison function and apply it to a grid in that way, or just np built-ins like np.square ?
@dcsan It can be done if the vectors have the same length. I think that this question may help: stackoverflow.com/questions/35215161/… But if the vectors have different lengths, than numpy is unlikely to be able to handle them, as long as they are not preprocessed in some way
0

Assuming calc_sim(A, B) == calc_sim(B, A), you could try this:

for A in range(0, len(items)):
   for B in range(A, len(items)): # Replace with A+1 if you don't want the case A == B
      # Remember A and B are indexes, so change code accordingly
      result = calc_sim(items[A], items[B])
      sim[A][B] = result # Copy result to both A,B and B,A as they are equal
      sim[B][A] = result

However actually both algorithms are O(n) n2

4 Comments

Now calc_sum will get indices, not the actual values
Yeah, I made a note about it, but then @abc has a better answer.... stackoverflow.com/a/65544844/2180316
for B in range(A, ... nice trick, this will effectively cut off one half of the mirror image.
Its not very Pythonic, you should use the itertools method.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.