2

I need help (preferably a full algorithm, but any hint or reference will be appreciated) with the following algorithmic problem:

We have a set of N elements. I can define a distance between any two elements, which satisfies the metric conditions. I need to group these elements into disjoint subsets (each element belonging to exactly one subset) according to the following rules:

  1. The maximum distance between any two elements in each subset does not exceed specified threshold.

  2. The number of the subsets is as small as possible.

  3. If there is more than one possible grouping satisfying conditions (1) and (2), the maximum distance between any two elements in each subset should be as small as possible.

Example:

Assume we have the following points on a number axis: 1, 11, 12, 13, 23. The distance is simple the the difference between the points. Our distance threshold is 10. The two possible grouping satisfying conditions (1) and (2) are: (1, 11), (12), (13, 23) or (1), (11, 12, 13), (23). However, the condition (3) says that the latter grouping is the correct one.

8
  • What problems are you running into when trying to do this in the straight-forward way? Commented Nov 16, 2017 at 21:46
  • @sascha can be between 20 and 100 Commented Nov 16, 2017 at 22:35
  • @500 - Internal Server Error, what do you mean by straightforward way? Commented Nov 16, 2017 at 22:37
  • Just saw it's 1d-only. So while not answering your question, here some start, at least in regards to what to look for. Commented Nov 16, 2017 at 23:10
  • One approach (as nearly always) would be Mixed-integer-programming, which should be able to produce good approximations given short time and exact results given more time. But given that there is no language and an algorithm is asked for, i won't doing a python-based implementation. Commented Nov 16, 2017 at 23:34

1 Answer 1

1

In 1 dimensional data, sort your data, and divide into the desired number of bins, then move bin boundaries to optimize.

It gets more interesting in higher dimensionality. There, the problem will be NP hard. So finding the optimum will be expensive. You can indeed use clustering here: use complete-linkage clustering. For a O(n²) and O(n) memory approach, try CLINK. But in my experience, you will need to run this algorithm several times, on shuffled data, to get a good solution.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.