14

Say I have a list of lists that has indexes [[start, end], [start1, end1], [start2, end2]].

Like for example :

[[0, 133], [78, 100], [25, 30]].

How would I get check for overlap between among the lists and remove the list with the longer length each time? So:

>>> list = [[0, 133], [78, 100], [25, 30]]
>>> foo(list)
[[78, 100], [25, 30]]

This is what I tried to do so far:

def cleanup_list(list):
    i = 0
    c = 0
    x = list[:]
    end = len(x)
    while i < end-1:
        for n in range(x[i][0], x[i][1]):
            if n in range(x[i+1][0], x[i+1][1]):
                list.remove(max(x[i], x[i+1]))
        i +=1
    return list

But in addition to being kind of convoluted it's not working properly:

 >>>cleanup_list([[0,100],[9,10],[12,90]])
 [[0, 100], [12, 90]]

Any help would be appreciated!

EDIT:

Other examples would be:

>>>a = [[0, 100], [4, 20], [30, 35], [30, 78]]
>>>foo(a)
[[4, 20], [30, 35]]

>>>b = [[30, 70], [25, 40]]
>>>foo(b)
[[25, 40]]

I'm basically trying to remove all of the longest lists that overlap with another list. In this case I don't have to worry about the lists being of equal length.

Thanks!!

10
  • ([[0,100],[9,10],[12,90]]) should go to [[0,100]] correct? Commented May 1, 2013 at 4:49
  • 5
    I fear this will be an ill-defined problem in case of three or more overlaps Commented May 1, 2013 at 4:50
  • I'm trying to actually remove [0, 100] and get [[9, 10], [12, 90]] Commented May 1, 2013 at 4:50
  • 1
    Sorry, I'm not entirely sure what you mean? Commented May 1, 2013 at 4:54
  • 1
    @user2338068: Can you show us some sample input / output combinations Commented May 1, 2013 at 4:55

5 Answers 5

11

To remove a minimal number of intervals from the list such that the intervals that are left do not overlap, O(n*log n) algorithm exists:

def maximize_nonoverlapping_count(intervals):
    # sort by the end-point
    L = sorted(intervals, key=lambda (start, end): (end, (end - start)),
               reverse=True) # O(n*logn)
    iv = build_interval_tree(intervals) # O(n*log n)
    result = []
    while L: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(L.pop()) # O(1)
        # remove intervals that overlap with the popped interval
        overlapping_intervals = iv.pop(result[-1]) # O(log n + m)
        remove(overlapping_intervals, from_=L) 
    return result

It should produce the following results:

f = maximize_nonoverlapping_count
assert f([[0, 133], [78, 100], [25, 30]]) == [[25, 30], [78, 100]]
assert f([[0,100],[9,10],[12,90]]) == [[9,10], [12, 90]]
assert f([[0, 100], [4, 20], [30, 35], [30, 78]]) == [[4, 20], [30, 35]]
assert f([[30, 70], [25, 40]]) == [[25, 40]]

It requires the data structure that can find in O(log n + m) time all intervals that overlap with the given interval e.g., IntervalTree. There are implementations that can be used from Python e.g., quicksect.py, see Fast interval intersection methodologies for the example usage.


Here's a quicksect-based O(n**2) implementation of the above algorithm:

from quicksect import IntervalNode

class Interval(object):
    def __init__(self, start, end):
        self.start = start
        self.end = end
        self.removed = False

def maximize_nonoverlapping_count(intervals):
    intervals = [Interval(start, end) for start, end in intervals]
    # sort by the end-point
    intervals.sort(key=lambda x: (x.end, (x.end - x.start)))   # O(n*log n)
    tree = build_interval_tree(intervals) # O(n*log n)
    result = []
    for smallest in intervals: # O(n) (without the loop body)
        # pop the interval with the smallest end-point, keep it in the result
        if smallest.removed:
            continue # skip removed nodes
        smallest.removed = True
        result.append([smallest.start, smallest.end]) # O(1)

        # remove (mark) intervals that overlap with the popped interval
        tree.intersect(smallest.start, smallest.end, # O(log n + m)
                       lambda x: setattr(x.other, 'removed', True))
    return result

def build_interval_tree(intervals):
    root = IntervalNode(intervals[0].start, intervals[0].end,
                        other=intervals[0])
    return reduce(lambda tree, x: tree.insert(x.start, x.end, other=x),
                  intervals[1:], root)

Note: the time complexity in the worst case is O(n**2) for this implementation because the intervals are only marked as removed e.g., imagine such input intervals that len(result) == len(intervals) / 3 and there were len(intervals) / 2 intervals that span the whole range then tree.intersect() would be called n/3 times and each call would execute x.other.removed = True at least n/2 times i.e., n*n/6 operations in total:

n = 6
intervals = [[0, 100], [0, 100], [0, 100], [0, 10], [10, 20], [15, 40]])
result = [[0, 10], [10, 20]]

Here's a banyan-based O(n log n) implementation:

from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan

def maximize_nonoverlapping_count(intervals):
    # sort by the end-point O(n log n)
    sorted_intervals = SortedSet(intervals,
                                 key=lambda (start, end): (end, (end - start)))
    # build "interval" tree O(n log n)
    tree = SortedSet(intervals, updator=OverlappingIntervalsUpdator)
    result = []
    while sorted_intervals: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(sorted_intervals.pop()) # O(log n)

        # remove intervals that overlap with the popped interval
        overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
        tree -= overlapping_intervals # O(m log n)
        sorted_intervals -= overlapping_intervals # O(m log n)
    return result

Note: this implementation considers [0, 10] and [10, 20] intervals to be overlapping:

f = maximize_nonoverlapping_count
assert f([[0, 100], [0, 10], [11, 20], [15, 40]]) == [[0, 10] ,[11, 20]]
assert f([[0, 100], [0, 10], [10, 20], [15, 40]]) == [[0, 10] ,[15, 40]]

sorted_intervals and tree can be merged:

from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan

def maximize_nonoverlapping_count(intervals):
    # build "interval" tree sorted by the end-point O(n log n)
    tree = SortedSet(intervals, key=lambda (start, end): (end, (end - start)),
                     updator=OverlappingIntervalsUpdator)
    result = []
    while tree: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(tree.pop()) # O(log n)

        # remove intervals that overlap with the popped interval
        overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
        tree -= overlapping_intervals # O(m log n)
    return result
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks so much! I'm new to coding so some of this stuff is a tad confusing - but I'll read through your links. But to clarify, is build_interval_tree a function that I'll have to create that's similar to the code in quicksect.py?
@user2338068: I've added a quicksect-based implementation, to show how it could look like (you can run it if you download quicksect.py and put it in the same directory as the script).
@user2338068: I've added banyan-based O(n log n) implementation (banyan is not pure Python (C++ is used to write C extension for Python) so it might make it harder to install).
A lot of this is over my head right now, but I'll go through it as best as I can. Thanks man, I really appreciate it.
3

This may not be the fastest solution, but really verbose and clear - I think.

a = [[2,100], [4,10], [77,99], [38,39], [44,80], [69,70], [88, 90]]

# build ranges first
def expand(list):
    newList = []
    for r in list:
        newList.append(range(r[0], r[1] + 1))
    return newList


def compare(list):
    toBeDeleted = []
    for index1 in range(len(list)):
        for index2 in range(len(list)):
            if index1 == index2:
                # we dont want to compare ourselfs
                continue
            matches = [x for x in list[index1] if x in list[index2]]
            if len(matches) != 0: # do we have overlap?
                ## compare lengths and get rid of the longer one
                if   len(list[index1]) > len(list[index2]):
                    toBeDeleted.append(index1)
                    break
                elif len(list[index1]) < len(list[index2]):
                    toBeDeleted.append(index2)
    # distinct
    toBeDeleted = [ toBeDeleted[i] for i,x in enumerate(toBeDeleted) if x not in toBeDeleted[i+1:]] 
    print len(list)
    # remove items
    for i in toBeDeleted[::-1]:
        del list[i] 
    return list


print(compare(expand(a)))

3 Comments

This solution might delete too much i.e., it might return less non-overlapping intervals than possible to preserve. It is O(n**4) unnecessarily. It is not clear whether the distinct computation is correct, do you mean toBeDeleted = set(toBeDeleted) or something else? list is a builtin name, avoid using it as a variable.
You are right: its not speedy (especially compared to yours) but it's more verbose and easier to maintain IMHO. toBeDeleted = set(toBeDeleted) does just what my for comprehension does, its just more obvious if you're not aware of set(). Maybe I got the problem wrong, but I haven't noticed any non-overlapping removals. Thanks for your opinion!
To be clear 1. This solution doesn't work: [[0, 10], [9, 12], [11, 20]] -> [[9, 10, 11, 12]] instead of [[0, 10], [11, 20]] 2. If it were working; it would be unnecessary slow e.g., even for a small input such as len(a) == 100, this solution requires ~100**4 (~100000000) operations. It is easy to make it both O(n**2) (~10000 operations) and more readable (though I find it debatable that my O(n log n) (~500 operations) version (especially banyan-based solution that uses single SortedSet) is less readable than the above O(n**4) solution).
2

I think one problem in your code is that it does not handle with the situation where one list contains another. For example, [0,100] contains [9,10]. When you loop n in [0,100] and n enters [9,10], the condition statement if n in range(x[i+1][0], x[i+1][1]) is triggered. Then the builtin function max will compare [0, 100] and [9, 10], and unfortuantely max will return [9,10] because it compare the first number in the list. Thus you remove the wrong element.

I'm trying another way to achieve the effect you want. Rather than manipulating the list itself, I create a new list. Conditionally appending new element to it if it meets our requirements.

def cleanup_list(lists):
    ranges = []
    for l in lists:
        to_insert = True
        for i in ranges:
            r = range(i[0],i[1])
            # if l overlaps with i, but l does not contain i
            if l[0] in r or l[1] in r:
                if (l[1]-l[0]) < len(r):
                    ranges.remove(i)
                else:
                    to_insert = False
            # l contains i
            if l[0]<i[0] and l[1]>i[1]:
                to_insert = False
        if to_insert:
            ranges.append(l)
    return ranges

2 Comments

O(n^2) time complexity. slow.
Maybe this problem can be addressed by heap. Since all lists do not overlap, they can be represented by a binary heap, with start and end as value. This should optimize the time complexity to log(n)
1
  1. ascending sort all items by length.

  2. add them to a segment tree and ignore overlapped items.

1 Comment

So would this be an optimized way to go about it? I'll look into it - thanks!
1

In general, two intervals are overlapping if:

min([upperBoundOfA, upperBoundOfB]) >= max([lowerBoundOfA, lowerBoundOfB])

If this is the case, the union of those intervals is:

(min([lowerBoundOfA, lowerBoundOfB]), max([upperBoundOfA, upperBoundOfB])

Similarly, the intersection of those intervals will be:

(min([upperBoundOfA, upperBoundOfB]), max([lowerBoundOfA, lowerBoundOfB]))

1 Comment

I don't think finding the intersection or union helps in this particular case, but this is some very useful general advice I'll keep in mind. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.