2011-04-15 109 views
29

我有一个元组列表,其中每个元组都是(start-time, end-time)。我正在尝试合并所有重叠的时间范围并返回不同时间范围的列表。 例如合并具有重叠时间范围的时间范围元组列表

[(1, 5), (2, 4), (3, 6)] ---> [(1,6)] 
[(1, 3), (2, 4), (5, 8)] ---> [(1, 4), (5,8)] 

这是我如何实现它。

# Algorithm 
# initialranges: [(a,b), (c,d), (e,f), ...] 
# First we sort each tuple then whole list. 
# This will ensure that a<b, c<d, e<f ... and a < c < e ... 
# BUT the order of b, d, f ... is still random 
# Now we have only 3 possibilities 
#================================================ 
# b<c<d: a-------b   Ans: [(a,b),(c,d)] 
#     c---d 
# c<=b<d: a-------b   Ans: [(a,d)] 
#    c---d 
# c<d<b: a-------b   Ans: [(a,b)] 
#   c---d 
#================================================ 
def mergeoverlapping(initialranges): 
    i = sorted(set([tuple(sorted(x)) for x in initialranges])) 

    # initialize final ranges to [(a,b)] 
    f = [i[0]] 
    for c, d in i[1:]: 
     a, b = f[-1] 
     if c<=b<d: 
      f[-1] = a, d 
     elif b<c<d: 
      f.append((c,d)) 
     else: 
      # else case included for clarity. Since 
      # we already sorted the tuples and the list 
      # only remaining possibility is c<d<b 
      # in which case we can silently pass 
      pass 
    return f 

我想如果

  1. 弄清楚的是一个在某些Python模块的内置功能,可以更有效地做到这一点?或
  2. 是否有一种更为pythonic的方式来实现相同的目标?

您的帮助表示感谢。谢谢!然后

回答

13

一些使其更有效的方法,Pythonic:

  1. 消除t因为该算法应该在主循环期间删除重复。
  2. 如果您只需要遍历结果,请使用yield来生成值。
  3. 减少中间对象的构建,例如:将tuple()调用移动到生成最终值的位置,使您不必构造并丢弃额外的元组,并重新使用列表saved来存储当前时间范围比较。

代码:

def merge(times): 
    saved = list(times[0]) 
    for st, en in sorted([sorted(t) for t in times]): 
     if st <= saved[1]: 
      saved[1] = max(saved[1], en) 
     else: 
      yield tuple(saved) 
      saved[0] = st 
      saved[1] = en 
    yield tuple(saved) 

data = [ 
    [(1, 5), (2, 4), (3, 6)], 
    [(1, 3), (2, 4), (5, 8)] 
    ] 

for times in data: 
    print list(merge(times)) 
+0

谢谢!同意我应该消除`set()`。循环会照顾它。就像根据需要产生元组而不是追加到列表一样。 – 2011-04-15 19:22:31

+2

不幸的是,如果`len(times)== 0`,这会失败。 – phihag 2012-05-28 21:19:38

2

排序元组列表中,如果t1.right> = t2.left =>合并 ,并与新的列表重新启动,...

- >

def f(l, sort = True): 
    if sort: 
     sl = sorted(tuple(sorted(i)) for i in l) 
    else: 
     sl = l 
    if len(sl) > 1: 
     if sl[0][1] >= sl[1][0]: 
      sl[0] = (sl[0][0], sl[1][1]) 
      del sl[1] 
      if len(sl) < len(l): 
       return f(sl, False) 
    return sl 
1

的分类部分:使用标准的排序,比较的元组已经以正确的方式。

sorted_tuples = sorted(initial_ranges) 

合并部分。它也消除了重复的范围,所以不需要set。假设你有current_tuplenext_tuple

c_start, c_end = current_tuple 
n_start, n_end = next_tuple 
if n_start <= c_end: 
    merged_tuple = min(c_start, n_start), max(c_end, n_end) 

我希望逻辑足够清楚。

要查看下一个元组,可以使用索引访问sorted tuples;无论如何,这是一个完全已知的序列。

1

对所有边界进行排序,然后取所有对,其中边界结束后是边界开始。

def mergeOverlapping(initialranges): 
    def allBoundaries(): 
     for r in initialranges: 
      yield r[0], True 
      yield r[1], False 

    def getBoundaries(boundaries): 
     yield boundaries[0][0] 
     for i in range(1, len(boundaries) - 1): 
      if not boundaries[i][1] and boundaries[i + 1][1]: 
       yield boundaries[i][0] 
       yield boundaries[i + 1][0] 
     yield boundaries[-1][0] 

    return getBoundaries(sorted(allBoundaries())) 

嗯,这不是很美,但很有趣,至少写!

编辑:多年后,upvote后,我意识到我的代码是错的!这是新版本只是为了好玩:

def mergeOverlapping(initialRanges): 
    def allBoundaries(): 
     for r in initialRanges: 
      yield r[0], -1 
      yield r[1], 1 

    def getBoundaries(boundaries): 
     openrange = 0 
     for value, boundary in boundaries: 
      if not openrange: 
       yield value 
      openrange += boundary 
      if not openrange: 
       yield value 

    def outputAsRanges(b): 
     while b: 
      yield (b.next(), b.next()) 

    return outputAsRanges(getBoundaries(sorted(allBoundaries()))) 

基本上我标志着边界为-1或1,然后在打开和关闭括号之间的平衡是由零值,只能输出它们进行排序。

0

已经很晚了,但可能会帮助有人找这个。我有类似的问题,但与字典。给定一个时间范围列表,我希望找到重叠并尽可能合并它们。稍加修改,以@samplebias的回答使我这个:

合并功能:

def merge_range(ranges: list, start_key: str, end_key: str): 
    ranges = sorted(ranges, key=lambda x: x[start_key]) 
    saved = dict(ranges[0]) 

    for range_set in sorted(ranges, key=lambda x: x[start_key]): 
     if range_set[start_key] <= saved[end_key]: 
      saved[end_key] = max(saved[end_key], range_set[end_key]) 
     else: 
      yield dict(saved) 
      saved[start_key] = range_set[start_key] 
      saved[end_key] = range_set[end_key] 
    yield dict(saved) 

数据:

data = [ 
    {'start_time': '09:00:00', 'end_time': '11:30:00'}, 
    {'start_time': '15:00:00', 'end_time': '15:30:00'}, 
    {'start_time': '11:00:00', 'end_time': '14:30:00'}, 
    {'start_time': '09:30:00', 'end_time': '14:00:00'} 
] 

执行:

print(list(merge_range(ranges=data, start_key='start_time', end_key='end_time'))) 

输出:

[ 
    {'start_time': '09:00:00', 'end_time': '14:30:00'}, 
    {'start_time': '15:00:00', 'end_time': '15:30:00'} 
]