2

I two sets of intervals that correspond to the same 1-dimensional (linear) space. Here is a rough visual--in reality, there are many more intervals and they are much more spread out, but this gives the basic idea.

interval graphic

Each of these intervals contains information, and I am writing a program to compare the information in one set of intervals (the red) to the information contained in the other set (the blue).

Here is my problem. I would like to partition the space into n chunks such that there is roughly an equal amount of comparison work to be done in each chunk (the amount of work depends on the number of intervals in that portion of the space). Also, the partition should not split any red or blue interval across two chunks.

So the input is two sets of intervals, and the desired output is a partition of the space such that

  • the intervals are (roughly) equally distributed across each element of the partition
  • no interval overlaps with multiple partition elements

Can anyone suggest an approach or an algorithm for doing this?

9
  • @Daniel: Is it expensive for you to scan the entire space once before beginning comparison, in order to build a comparison list? Also, are you guaranteed to have an equal number of red and blue intervals? Is there any way to identify the sequence of an interval by examining it (e.g. sequence number encoded in a header or something)? Commented Apr 1, 2011 at 3:19
  • It is likely that there will not be an equal number of intervals. A quick initial scan of the linear space will probably be required and will not be expensive. I can store pointers to "interval" objects (objects with start and end coordinates) in multiple data structures if necessary, although it would be too memory intensive to store much more than that. An interval tree and a dynamic array came to mind first, but I'm trying to figure out how I can use them... Commented Apr 1, 2011 at 3:34
  • @Daniel - Am I missing something, or can't you just create a list of pointer pairs with a quick scan, then partition that list for processing? Commented Apr 1, 2011 at 3:44
  • 2
    There is not necessarily any solution, since you're only allowed to "snip" the space into segments at points that are not covered by either a red or a blue interval. E.g. if the intervals are arranged like the bricks in 2 rows of brickwork, there will be no such points. (Also you don't say whether or not the intervals are guaranteed not to overlapping within their respective sets, but my counterexample holds even if they don't.) Commented Apr 1, 2011 at 5:13
  • It is not clear what depends the workload of a chunk - you say the number of comparison, which depends on the intervalls. But which intervalls, the ones from set 1 or from set 2, or from both? If it would depend just on one of them the distribution of the intervalls in would be interresting. Commented Apr 1, 2011 at 7:50

1 Answer 1

2

Define a "word" to be a maximal interval in which every point belongs either to a red interval or a blue interval. No chunk can end in the middle of a word, and every union of consecutive words is a potential chunk. Now apply a minimum raggedness word-wrap algorithm to the words, where the length of a word is defined to be the number of intervals it contains (line = chunk).

Sign up to request clarification or add additional context in comments.

5 Comments

The tricky part to using this is guessing a line width that results in the n lines needed.
The rest of the question makes that sound like a soft constraint. More pressing is the question of whether non-contiguous partitions are allowed.
Yes, a sub-optimal but near-optimal solution is acceptable, but "words" must be maintained in sorted order.
I think this is the best thing to do, using a binary search to find a good line width. I tried changing the DP recursion so that we have a fixed, known number of lines and the unknown linewidth is dependent on them, but this causes a state space blowup (= inefficiency) because now the best cost of a solution starting at the jth word (assuming an exponent of 2 in the cost function) depends not just on the sum of the lengths of lines we have already (something we can immediately get from j) but also on the number of those lines, and the sum of squares of their lengths.
I had worked with the idea of a "word" previously (I called it something else), but for some reason I didn't think of it for this problem. If there are k words, it turns out that splitting the space so that there are k/n words in each chunk is an acceptable heuristic. Not exactly what you suggested, but reminding me of the "word" concept is just what I needed!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.