-1

I'm trying to write a code in python that can remove overlapping lists. My lists are in the format [lower_bound,upper_bound,data,data]. These lists are stored in a larger list. I want to iterate over the larger list, removing any of these smaller lists where the bounds overlap. This is what I have.

def cut_overlapping_ranges(ranges):

  keep_ranges = []
  temp_ranges = ranges

  for range_1 in temp_ranges:
    keep_ranges.append(range_1)
    for range_2 in temp_ranges:
      if range_1 != range_2:
        if ((range_1[0] < range_2[0] < range_1[1]) or (range_1[0] < range_2[1] < range_1[1])):
          temp_ranges.remove(range_2)
  return
7
  • 2
    Never modify an iterator in python (like temp_ranges} while iterating over it . Additionally, you're not checking for the case where range_2 completely contains range_1. Commented Mar 5 at 20:26
  • 2
    note, your code has a bug because you are modifying temp_ranges while you iterate over it, Also note, temp_ranges = ranges doesn't create a copy, so you are mutating the input list Commented Mar 5 at 20:26
  • 4
    Some example data would help us to help you faster. Commented Mar 5 at 20:35
  • 4
    Show us some cases of actual input data, and the results you would expect. Commented Mar 5 at 20:38
  • 1
    What is your question about what you have? Commented Mar 7 at 8:07

3 Answers 3

2

One efficient approach is to build an interval tree where all nodes to the left of a node have intervals lower than or equal to the lower bound of the node, and all nodes to the right of the node have intervals greater than or equal to the upper bound of the node so that adding an interval to the tree has an average time complexity of O(log n), and construction of the tree overall takes O(n log n) on average.

Since the goal is to retain non-overlapping intervals, we can make an insertion raise an exception to reject an overlapping interval:

class Interval:
    def __init__(self, lower_bound, upper_bound):
        self.lower_bound = lower_bound
        self.upper_bound = upper_bound

class NonOverlappingIntervalTree:
    def __init__(self):
        self.interval = None
        self.left = None
        self.right = None

    def add_interval(self, lower_bound, upper_bound):
        if not self.interval:
            self.interval = Interval(lower_bound, upper_bound)
            return
        if upper_bound <= self.interval.lower_bound:
            if not (node := self.left):
                node = self.left = NonOverlappingIntervalTree()
        elif lower_bound >= self.interval.upper_bound:
            if not (node := self.right):
                node = self.right = NonOverlappingIntervalTree()
        else:
            raise ValueError
        node.add_interval(lower_bound, upper_bound)

so your filter function can just skip an interval when it is found to overlap an existing node in the interval tree when attempting to add it to the tree:

def cut_overlapping_ranges(ranges):
    output = []
    tree = NonOverlappingIntervalTree()
    for lower_bound, upper_bound, *data in ranges:
        try:
            tree.add_interval(lower_bound, upper_bound)
        except ValueError:
            continue
        output.append([lower_bound, upper_bound, *data])
    return output

so that:

cut_overlapping_ranges([[0, 10], [5, 15], [11, 18], [10, 25], [20, 25], [30, 35]])

outputs:

[[0, 10], [11, 18], [20, 25], [30, 35]]

Demo: https://ideone.com/vv48Oh

Sign up to request clarification or add additional context in comments.

Comments

0

This is the solution I found, thanks to help on loop mechanics from juanpa.arrivillaga. I imagine it's not too efficient, for that, look to other answers; however, this works without excess libraries.

def cut_overlapping_ranges(ranges):

  keep_ranges = []
  temp_ranges = ranges.copy()

  while True:
    range_position = 0
    if len(temp_ranges) == 0:
      break
    range_1 = temp_ranges[0]
    lb = range_1[0]
    ub = range_1[1]
    keep_ranges.append(range_1)
    temp_ranges.remove(range_1)
    while True:
      if len(temp_ranges) == 0:
        break
      try:
        range_2 = temp_ranges[range_position]
        if ((lb < range_2[0] < ub) or (lb < range_2[1] < ub)):
            temp_ranges.remove(range_2)
        else:
          range_position += 1
      except IndexError:
        break

  print(range)

  return

input is

[[0.0023, 0.0033],[0.0025, 0.0035],[0.0029, 0.0039],[0.0022, 0.0032],[0.0017, 0.0027],[0.0031, 0.0041],[0.0032, 0.0042],[-0.0005, 0.0005],[0.0014, 0.0024],[-0.0002, 0.0008],[-0.0006,0.0004],[0.0012, 0.0022],[0.0013, 0.0023],[0.0008, 0.0018],[0.0011, 0.0021]]

output is

[[0.0023, 0.0033],[-0.0005, 0.0005],[0.0012, 0.0022]]

5 Comments

you really shouldn't have bare except clauses!
the issue I had was that at the end of the list, it increments an extra time which makes the code under the try clause search for an index outside of the list. So, I had to use a try clause, which itself needs the except clause to not give an error. I imagine there's a better way to do this, maybe another clause that pairs with try?
No, my point is you should pretty much never have a bare except clause, you should try to catch the most specific exception possible and your try-except should wrap the minimum amount of code possible. Here you could use except IndexError: instead of just except:
"however, this works without excess libraries." If you are referring to the typing and dataclasses libraries that my answer imports, they are Python's standard libraries, and can be trivially replaced by writing __init__ methods that initialize those instance variables yourself.
thank you both for your responses, I've fixed the except clause. I'm quite new to coding so I'm not aware of what libraries are includes, and many other things like the except clause. I appreciate your help and the clarity of your answers.
0

Here was my guess about the problem. I don't know if two ranges overlap, you want to remove them both or just the second one. My solution removes the second one.

L = [[0, 10], [5, 15], [11, 18], [10, 25], [20, 25], [30, 35]]
L.sort(key= lambda x: x[0])

sets = []
delete_idx = set()

for sub_list in L:
    sets.append(set(range(sub_list[0], sub_list[1]+1)))

for i, set_1 in enumerate(sets[:-1]):
    if i in delete_idx:
        continue
    for j, set_2 in enumerate(sets[i+1:], start = i+1):
        if j in delete_idx:
            continue
        if len(set_1 & set_2) > 0:
            delete_idx.add(j)

# remove from the greatest idx to smallest idx
for i in sorted(delete_idx, reverse=True):
    L.remove(L[i])

print(L)

It printed:

[[0, 10], [11, 18], [20, 25], [30, 35]]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.