0

I'm a bit confused about the clusering with Scipy in Python. Here is my sourcecode:

import scipy.spatial.distance as dist
import numpy, scipy

dataMatrix = numpy.array(matrix)
distMatrix = dist.pdist(dataMatrix, 'euclidean')
distSquareMatrix = dist.squareform(distMatrix)

Y = scipy.cluster.hierarchy.linkage(distSquareMatrix, method='complete')

Do I have to use the 'distMatrix' or the squareform 'distSquareMatrix' as input for the clustering? Because I saw both methods in other posts. But the output is different. Now I'm not sure what I have to choose.

1 Answer 1

4

You need to pass the distance matrix in condensed form without transforming it with squareform. The squareform function is useful if you want to manipulate the distance matrix yourself more easily as a 2D array. The scipy.cluster.hierarchy functions use the condensed form for the purpose of saving roughly a factor of two in memory.

I hope this helps.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer! That means I have to use 'distMatrix' in the linkage function...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.