0

I created a cosine similarity method, which gives the correct results when called with indivdual vectors, but when I supply a list of vectors I suddenly get different results. Isn't numpy supposed to calculate the formula for every element in the list? Is my understanding wrong?

Cosine similarity:

def cosine_similarity(vec1, vec2):
  return np.inner(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

Example:

a = [1, 2, 3]
b = [4, 5, 6]
print(cosine_similarity(a, a), cosine_similarity(a, b), cosine_similarity(a, [a, b]))

With the result:

1.0 0.9746318461970762 [0.39223227 0.8965309 ]

The first two values are correct, the array of values should be the same, but isn't. Is this just not possible or do I have to change something?

2
  • My first guess is, that the np.linalg.norm(vec2) needs to be called with the axis argument. When passing [a,b] into the norm function without axis=-1 it computes the norm of a 2x3 matrix instead of the norm of each vector Commented Dec 9, 2021 at 13:55
  • Just confirmed that using np.linalg.norm(vec2, axis=-1) works as you expected. Commented Dec 9, 2021 at 13:57

1 Answer 1

2

Your understanding is actually correct. Many functions in numpy allow the keyword argument axis to be specified on call. np.linalg.norm for example computes the norm along the specified axis. In your case, if it is not specified, norm calulates the norm of the 2x3 matrix [a, b] instead calculating the norm per row. To fix the code just do the following:

def cosine_similarity(vec1, vec2):
  return np.inner(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2, axis=-1))
Sign up to request clarification or add additional context in comments.

1 Comment

The alternative is to transpose vec2

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.