gpt4 book ai didi

Python - 删除重叠列表

转载 作者:太空狗 更新时间:2023-10-29 17:29:43 24 4
gpt4 key购买 nike

假设我有一个包含索引 [[start, end], [start1, end1], [start2, end2]] 的列表列表。

例如:

[[0, 133], [78, 100], [25, 30]]

我如何检查列表之间的重叠并每次删除长度较长的列表? 所以:

>>> list = [[0, 133], [78, 100], [25, 30]]
>>> foo(list)
[[78, 100], [25, 30]]

这是我到目前为止尝试做的:

def cleanup_list(list):
i = 0
c = 0
x = list[:]
end = len(x)
while i < end-1:
for n in range(x[i][0], x[i][1]):
if n in range(x[i+1][0], x[i+1][1]):
list.remove(max(x[i], x[i+1]))
i +=1
return list

但除了有点复杂之外,它还不能正常工作:

 >>>cleanup_list([[0,100],[9,10],[12,90]])
[[0, 100], [12, 90]]

如有任何帮助,我们将不胜感激!

编辑:

其他例子是:

>>>a = [[0, 100], [4, 20], [30, 35], [30, 78]]
>>>foo(a)
[[4, 20], [30, 35]]

>>>b = [[30, 70], [25, 40]]
>>>foo(b)
[[25, 40]]

我基本上是在尝试删除所有与另一个列表重叠的最长列表。在这种情况下,我不必担心列表的长度相等。

谢谢!!

最佳答案

要从列表中删除最小数量的间隔,使得剩下的间隔不重叠,O(n*log n) 算法存在:

def maximize_nonoverlapping_count(intervals):
# sort by the end-point
L = sorted(intervals, key=lambda (start, end): (end, (end - start)),
reverse=True) # O(n*logn)
iv = build_interval_tree(intervals) # O(n*log n)
result = []
while L: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(L.pop()) # O(1)
# remove intervals that overlap with the popped interval
overlapping_intervals = iv.pop(result[-1]) # O(log n + m)
remove(overlapping_intervals, from_=L)
return result

它应该产生以下结果:

f = maximize_nonoverlapping_count
assert f([[0, 133], [78, 100], [25, 30]]) == [[25, 30], [78, 100]]
assert f([[0,100],[9,10],[12,90]]) == [[9,10], [12, 90]]
assert f([[0, 100], [4, 20], [30, 35], [30, 78]]) == [[4, 20], [30, 35]]
assert f([[30, 70], [25, 40]]) == [[25, 40]]

它要求数据结构能够在O(log n + m) 时间内找到与给定区间重叠的所有区间,例如IntervalTree .有一些实现可以从 Python 使用,例如 quicksect.py , 请参阅 Fast interval intersection methodologies示例用法。


下面是上述算法的基于 quicksectO(n**2) 实现:

from quicksect import IntervalNode

class Interval(object):
def __init__(self, start, end):
self.start = start
self.end = end
self.removed = False

def maximize_nonoverlapping_count(intervals):
intervals = [Interval(start, end) for start, end in intervals]
# sort by the end-point
intervals.sort(key=lambda x: (x.end, (x.end - x.start))) # O(n*log n)
tree = build_interval_tree(intervals) # O(n*log n)
result = []
for smallest in intervals: # O(n) (without the loop body)
# pop the interval with the smallest end-point, keep it in the result
if smallest.removed:
continue # skip removed nodes
smallest.removed = True
result.append([smallest.start, smallest.end]) # O(1)

# remove (mark) intervals that overlap with the popped interval
tree.intersect(smallest.start, smallest.end, # O(log n + m)
lambda x: setattr(x.other, 'removed', True))
return result

def build_interval_tree(intervals):
root = IntervalNode(intervals[0].start, intervals[0].end,
other=intervals[0])
return reduce(lambda tree, x: tree.insert(x.start, x.end, other=x),
intervals[1:], root)

注意:对于此实现,最坏情况下的时间复杂度为 O(n**2),因为间隔仅标记为已删除 例如,想象这样的输入 intervals len(result) == len(intervals)/3 并且 len(intervals)/2 间隔跨越整个范围然后 tree.intersect () 将被调用 n/3 次并且每次调用将执行 x.other.removed = True 至少 n/2 次,即总共 n*n/6 次操作:

n = 6
intervals = [[0, 100], [0, 100], [0, 100], [0, 10], [10, 20], [15, 40]])
result = [[0, 10], [10, 20]]

这是一个 banyan基于O(n log n)的实现:

from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan

def maximize_nonoverlapping_count(intervals):
# sort by the end-point O(n log n)
sorted_intervals = SortedSet(intervals,
key=lambda (start, end): (end, (end - start)))
# build "interval" tree O(n log n)
tree = SortedSet(intervals, updator=OverlappingIntervalsUpdator)
result = []
while sorted_intervals: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(sorted_intervals.pop()) # O(log n)

# remove intervals that overlap with the popped interval
overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
tree -= overlapping_intervals # O(m log n)
sorted_intervals -= overlapping_intervals # O(m log n)
return result

注意:此实现将 [0, 10][10, 20] 间隔视为重叠:

f = maximize_nonoverlapping_count
assert f([[0, 100], [0, 10], [11, 20], [15, 40]]) == [[0, 10] ,[11, 20]]
assert f([[0, 100], [0, 10], [10, 20], [15, 40]]) == [[0, 10] ,[15, 40]]

sorted_intervalstree 可以合并:

from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan

def maximize_nonoverlapping_count(intervals):
# build "interval" tree sorted by the end-point O(n log n)
tree = SortedSet(intervals, key=lambda (start, end): (end, (end - start)),
updator=OverlappingIntervalsUpdator)
result = []
while tree: # until there are intervals left to consider
# pop the interval with the smallest end-point, keep it in the result
result.append(tree.pop()) # O(log n)

# remove intervals that overlap with the popped interval
overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
tree -= overlapping_intervals # O(m log n)
return result

关于Python - 删除重叠列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16312871/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com