2016-11-18 57 views
0

我想实现一个StreamingMedian对象通过get_median()连续调用,以维持平均实现MinHeap时坚持的状态。为此,我还通过heapq模块实施了MinHeapMaxHeap类。与heapq

虽然我遇到了一个非常奇怪的错误。出于某种原因,当我运行下面的命令:

print("Before streaming medians", MinHeap(), sep="\t") # is empty 

b = StreamingMedian() 
b.add_val(5) 
b.add_val(100) 

assert b.get_median() == 52.5 

print("After streaming medians, for MaxHeap", MaxHeap(), sep='\t') # is empty 
print("After streaming medians, for MinHeap", MinHeap(), sep='\t') # should be empty 
print("After streaming medians, for MinHeap with input", 
     MinHeap([]), sep="\t") # is empty 

我得到以下的输出:

Before streaming medians  [] 
After streaming medians, for MaxHeap [] 
After streaming medians, for MinHeap [100] 
After streaming medians, for MinHeap with input [] 

类的实现可以在下面找到。我在Python 3.5.2 :: Anaconda自定义(64位)上运行它。

import heapq 

class MinHeap(object): 

    def __init__(self, l=[]): 
     self.heap = l 
     heapq.heapify(l) 

    def peek(self): 
     return self.heap[0] 

    def pop(self): 
     return heapq.heappop(self.heap) 

    def push(self, x): 
     heapq.heappush(self.heap, x) 

    def pushpop(self, x): 
     return heapq.heappushpop(self.heap, x) 

    def replace(self, x): 
     return heapq.heapreplace(self.heap, x) 

    def __len__(self): 
     return len(self.heap) 

    def __str__(self): 
     return str(self.heap) 

class MaxHeap(MinHeap): 

    def _invert_sign(self, l): 
     return [-1 * a for a in l] 

    def __init__(self, l=[]): 
     super().__init__(self._invert_sign(l)) 

    def push(self, x): 
     super().push(-1 * x) 

    def pushpop(self, x): 
     return super().pushpop(-1 * x) 
    def replace(self, x): 
     return super().replace(-1 * x) 

    def pop(self): 
     return -1 * super().pop() 

    def peek(self): 
     return -1 * super().peek() 

    def __str__(self): 
     return str(self._invert_sign(self.heap)) 


class StreamingMedian(): 

    def __init__(self): 
     self.min_heap = MinHeap() 
     self.max_heap = MaxHeap() 

    def get_median(self): 
     min_heap_has_x_more = len(self.min_heap) - len(self.max_heap) 
     if min_heap_has_x_more > 0: 
      return self.min_heap.peek() 
     elif min_heap_has_x_more < 0: 
      return self.max_heap.peek() 
     else: 
      return (self.min_heap.peek() + self.max_heap.peek())/2 

    def add_val(self, x): 
     if len(self.min_heap) + len(self.max_heap) == 0: 
      self.max_heap.push(x) 
     else: 
      med = self.get_median() 
      if x > med: 
       self.min_heap.push(x) 
       self._ensure_balance() 
      elif x < med: 
       self.max_heap.push(x) 
       self._ensure_balance() 
      else: 
       self.max_heap.push(x) 
       self._ensure_balance() 

    def _ensure_balance(self): 
     size_diff = len(self.min_heap) - len(self.max_heap) 
     if abs(size_diff) > 1: 
      if size_diff > 0: # min_heap has 2 more elements 
       self.max_heap.push(self.min_heap.pop()) 
      else: # max_heap has 2 more elements 
       self.min_heap.push(self.max_heap.pop()) 
      assert abs(len(self.min_heap) - len(self.max_heap)) < 2 

print("Before streaming medians", MinHeap(), sep="\t") 

b = StreamingMedian() 
b.add_val(5) 
b.add_val(100) 

assert b.get_median() == 52.5 

print("After streaming medians, for MaxHeap", MaxHeap(), sep='\t') # is empty 
print("After streaming medians, for MinHeap", MinHeap(), sep='\t') # should be empty 
print("After streaming medians, for MinHeap with input", MinHeap([]), sep="\t") # is empty 

回答

0

的问题

因为默认参数是如何在Python计算的问题发生。它们是在函数第一次被调用时计算出来的,然后函数与原来的计算值一起使用。如果默认值是可变的(如列表所示),那么这会造成问题。

所以发生了什么事了MinHeap是:

  1. MinHeap最初创建和l参数默认参数被分配到一个内存地址。
  2. StreamingMedian修改MinHeapself.heap这与l指向的内容相同。
  3. 当再次调用MinHeap时,l已经有内存地址,并且该内存地址再次使用并与self.heap绑定。

这不会发生的,因为MaxHeap

  1. MaxHeap最初创建和l参数默认参数被分配到一个内存地址。
  2. _invert_sign创建一个新的列表分配给self.heap。默认参数l从不修改。
  3. MaxHeap重新初始化,l已经有一个内存地址,并再次使用,但它从未被修改,构建另一个副本,所以它永远不会被修改。

解决方案

相反的:

class MinHeap(object): 

def __init__(self, l=[]): 
    self.heap = l 
    heapq.heapify(l) 

我们应该使用:

class MinHeap(object): 

def __init__(self, l=None): 
    if l is None: 
     l = [] 
    self.heap = l 
    heapq.heapify(l) 

同样的变更应MaxHeap

进行