2015-07-21 46 views
5

我需要合并两个迭代器。我写了这个功能:合并两个已排序迭代器而不替换

def merge_no_repeat(iter1, iter2, key=None): 
    """ 
    a = iter([(2, 'a'), (4, 'a'), (6, 'a')]) 
    b = iter([(1, 'b'), (2, 'b'), (3, 'b'), (4, 'b'), (5, 'b'), (6, 'b'), (7, 'b'), (8, 'b')]) 
    key = lambda item: item[0] 
    fusion_no_repeat(a, b, key) -> 
       iter([(1, 'b'), (2, 'a'), (3, 'b'), (4, 'a'), (5, 'b'), (6, 'a'), (7, 'b'), (8, 'b')]) 
    :param iter1: sorted iterator 
    :param iter2: sorted iterator 
    :param key: lambda get sorted key, default: lambda x: x 
    :return: merged iterator 
    """ 
    if key is None: 
     key = lambda x: x 
    element1 = next(iter1, None) 
    element2 = next(iter2, None) 
    while element1 is not None or element2 is not None: 
     if element1 is None: 
      yield element2 
      element2 = next(iter2, None) 
     elif element2 is None: 
      yield element1 
      element1 = next(iter1, None) 
     elif key(element1) > key(element2): 
      yield element2 
      element2 = next(iter2, None) 
     elif key(element1) == key(element2): 
      yield element1 
      element1 = next(iter1, None) 
      element2 = next(iter2, None) 
     elif key(element1) < key(element2): 
      yield element1 
      element1 = next(iter1, None) 

这个函数有效。但我认为这太复杂了。是否有可能使用Python标准库使这个函数最简单?

+0

['itertools.chain'](https://docs.python.org/2/库/ itertools.html#itertools.chain)? –

+0

这个问题可能更适合http://codereview.stackexchange.com – jonrsharpe

+0

@MathiasEttinger itertools.chain不会返回一个已排序的迭代器。 –

回答

0

您可以使用:

def merge_no_repeat(iter1, iter2, key=None): 
    if key is None: 
     key = lambda x: x 
    ref = next(iter1, None) 
    for elem in iter2: 
     key_elem = key(elem) # caching value so we won't compute it for each value in iter1 that is before this one 
     while ref is not None and key_elem > key(ref): 
      # Catch up with low values from iter1 
      yield ref 
      ref = next(iter1, None) 
     if ref is None or key_elem < key(ref): 
      # Catch up with low values from iter2, eliminate duplicates 
      yield elem 
    # Update: I forgot to consume iter1 in the first version of this code 
    for elem in iter1: 
     # Use remaining items of iter1 if needed 
     yield elem 

我假定迭代器不会返回None值除非完全消耗掉,因为你必须在你的原代码if element1 is None:elif element1 is None:测试。


实例:

>>> from operator import itemgetter 
>>> list(merge_no_repeat(
...  iter([(2, 'a'), (4, 'a'), (6, 'a')]), 
...  iter([(1, 'b')]), 
...  itemgetter(0))) 
[(1, 'b'), (2, 'a'), (4, 'a'), (6, 'a')] 
>>> list(merge_no_repeat(
...  iter([(2, 'a'), (4, 'a'), (6, 'a')]), 
...  iter([(1, 'b'),(7, 'b'), (8, 'b')]), 
...  itemgetter(0))) 
[(1, 'b'), (2, 'a'), (4, 'a'), (6, 'a'), (7, 'b'), (8, 'b')] 
>>> list(merge_no_repeat(
...  iter([(2, 'a'), (4, 'a'), (6, 'a')]), 
...  iter([(1, 'b'),(3, 'b'), (4,'b'),(5,'b'),(7, 'b'), (8, 'b')]), 
...  itemgetter(0))) 
[(1, 'b'), (2, 'a'), (3, 'b'), (4, 'a'), (5, 'b'), (6, 'a'), (7, 'b'), (8, 'b')] 
+0

首先,你需要在最后的'if'中将'None'改为'不是None',但是它仍然返回错误的结果!试试我的例子! – Kasramvd

+0

@Kasramvd在最后如果,'a是None'实际上是一个测试,以检查是否iter1被消耗。 –

+0

哎呀,'a'实际上是'ref'。编辑 –

1

其中一个,如果其中一个迭代器返回None,则失败,您应该捕获StopIteration异常。二,一旦一个迭代器没有更多的值,你可以返回另一个值的其余值。

我觉得这是比较容易让你使用小包装类各地的迭代器,使可见的下一个值:

class NextValueWrapper(object): 
    def __init__(self, iterator): 
     self.iterator = iterator 
     self.next_value = None 
     self.finished = False 
     self.get() 

    def get(self): 
     if self.finished: return # Shouldn't happen, maybe raise an exception 
     value = self.next_value 
     try: 
      self.next_value = next(self.iterator) 
     except StopIteration: 
      self.finished = True 
     return value 

然后代码变为:

def merge(iter1, iter2, key=None): 
    if key is None: 
     key = lambda x: x 

    wrap1 = NextValueWrapper(iter1) 
    wrap2 = NextValueWrapper(iter2) 

    while not (wrap1.finished and wrap2.finished): 
     if (wrap2.finished or 
       (not wrap1.finished and 
       key(wrap1.next_value) <= key(wrap2.next_value))): 
      yield wrap1.get() 
     else: 
      yield wrap2.get() 

这是未经测试。它重复。而且它是Python 2,出于习惯。使它不重复是留给读者的一个练习,我没有注意到这也是一个要求...

+0

'NexValueWrapper.get'无条件地返回'None'。因此,如果两个迭代器都没有完成,你的代码将产生“None”。 –

+0

@MathiasEttinger:是的(忘了返回语句)和另一个bug,for循环产生值忘记产生下一个值。幸运的是我只是洗了澡,这会带来清晰度,会编辑:-) – RemcoGerlich