1

What is a good clustering algorithm that simply puts two data items into the same cluster if their separation is less than some user specified cutoff?

i.e. the result of X_clustering(data, distance, epsilon) is a set of cluster assignments such that for any pair i,j, they are in the same cluster if distance(data[i], data[j]) < epsilon. If distance(data[i], data[j]) >= epsilon they can be in different clusters (if there aren't other data that end up linking them...).

Another way of stating it is: i,j are in the same cluster if there exists a path [i, x, y, z..., j] through the data such that each step is of distance<epsilon, and they are in different clusters if no such path exists.

1 Answer 1

1

Your idea is not working. If for all pairs of (data[i], data[j]) in a cluster, their distance is less than a given epsilon, it means all members of that cluster are located in a circle with the radius of epsilon. Hence, this clustering method cannot be generalized.

By the way, DBSCAN is a good clustering algorithm in case of determining clusters base on their density with a given epsilon. You can modify this algorithm by adding a stronger constraint such that:

each data can be added to a cluster if its distance to all members of the cluster is less than a given epsilon.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.