优先级队列：并行处理

我想解决有关Coursera上优先级队列的这个算法问题，但他们网站上的分级人员总是说我的程序因时间限制超出错误而失败。事实是，当我在我的PC上运行大量输入（5000个线程，100000个作业）时，它可以顺利运行并在不超过1秒的时间内打印出正确的结果。优先级队列：并行处理

这是问题描述：

enter image description here

这是链接到我的代码在Github上：https://gist.github.com/giantonia/3ddbacddc7bd58b220ab592f802d9602

任何帮助表示赞赏！

来源

2016-12-16 Giantonia

你问全局代码审查的东西，并没有回答一个给定的问题陈述，和代码示例你”重新给出一个完整的片段。这超出了stackoverflow的范围，因为它使答案难以写出，而且您的问题太广泛了。建议遵循给定的约束条件，并将ulimit设置为512MB。 – zmo

首先，我会建议在本地运行最大测试的解决方案（即n = 100000和m = 100000）（是的，5000和100000是大测试，但是你会在那里停止吗？为什么不呢？你使用最大可能的测试用例？）。

其次，存在至少两个缺陷在您的解决方案：

它由一个增加的，而不是跳到下一个事件的时间：

while len(jobs) > 0: 
    if threads[0][1] <= time: 
     record.append([threads[0][0], time]) 
     ... 
    else: 
     time += 1

它需要O(MAX_T)操作。如果最大时间是10^9，那太多了。

jobs.pop(0)可能在O(n)工作（这取决于Python实现，但如果它像C++向量，这是许多口译的情况下），这将产生在最坏的情况下O(n^2)操作。这也太多了。

解决方案中可能还有其他缓慢的部分（我立即看到了这两个部分，所以我只写了这些部分）。

我建议你重新设计算法，证明它的速度足够快（提示：它应该是类似O((n + m) log n)的东西），并且只有在执行它之后。

来源

2016-12-16 16:57:44 kraskevich

代码最弱的点以下时，

while len(jobs) > 0: 
    if threads[0][1] <= time: 
     ... 
    else: 
     time += 1

这个循环将随时间一起执行，而不是就业人数都要做。它需要O（MAX_T）成本！太慢了！

这是我对这个问题的解决方案。它需要O（N + MlgN））。

这个想法很简单。

构造priority_queue以及最早的完成时间。
从priority_queue中选择next_thread并更新其完成作业的时间。
将其插入优先级队列

下面是代码，

# python3 

def parent_key_cal(key): 
    if key % 2 == 0: 
     parent_key = key//2 
    else: 
     parent_key = (key - 1)//2 
    return parent_key 

def swap(alist, key1, key2): 
    temp = alist[key1] 
    alist[key1] = alist[key2] 
    alist[key2] = temp 

def return_min_key(alist, parent, left, right, criteria): 

    min_value = parent 
    if alist[parent][criteria] > alist[left][criteria]: 
     min_value = left 
     if right != -1 and alist[min_value][criteria] > alist[right][criteria]: 
      min_value = right 
    elif alist[parent][criteria] < alist[left][criteria]: 
     if right != -1 and alist[min_value][criteria] > alist[right][criteria]: 
      min_value = right 

    return min_value 

def shift_up(alist, key): 

    while key > 1: 

     parent = parent_key_cal(key) 
     if alist[parent][1] != alist[key][1]: 
      if alist[parent][1] > alist[key][1]: 
       swap(alist, parent, key) 
       key = parent 
      else: 
       break 
     else: 
      if alist[parent][0] > alist[key][0]: 
       swap(alist, parent, key) 
       key = parent 
      else: 
       break 

def shift_down(alist, key): 

    if 2*key >= len(alist): 
     return 

    parent = key 
    left = 2*key 
    right = 2*key + 1 

    if right >= len(alist): 

     if (alist[parent] == alist[left]) == True: 
      min_value = return_min_key(alist, parent, left, -1, 0) 
     else: 
      min_value = return_min_key(alist, parent, left, -1, 1) 

    else: 

     if (alist[parent] == alist[left] == alist[right]) == True: 
      min_value = return_min_key(alist, parent, left, right, 0) 
     else: 
      min_value = return_min_key(alist, parent, left, right, 1) 

    if min_value != parent: 
     swap(alist, parent, min_value) 
     shift_down(alist, min_value)  


def min_heap(alist): 
    # Index 0 element is dummy. minimum element's index is 1 
    min = alist[1] 
    alist.pop(1) 

    # Maintain heap structure 
    parent_last_element = parent_key_cal(len(alist)-1) 
    for key in reversed(range(1, parent_last_element + 1)): 
     shift_down(alist, key) 

    return min 

def heap_insert(alist, value): 
    alist.append(value) 
    shift_up(alist, len(alist)-1) 

line1 = input().split() 
n = int(line1[0]) 
m = int(line1[1]) 
jobs = list(map(int, input().split())) 
threads = [] 
for i in range(n): 
    threads.append([i, 0]) 

# Insert dummy element to make heap calculation easier 
threads.insert(0,[-1,-1]) 

record = [] 
# O(M) 
while len(jobs) > 0: 
    # Allocate a job to a thread and record it this moment 
    # "threads" is min_heap along with time to finish a allocated job. 0 -> thread order, 1 -> time to finish the job 
    next_thread = min_heap(threads) # O(lgN) 
    record.append([next_thread[0], next_thread[1]]) 

    # Updated poped thread as much as time to finish the next job 
    next_thread[1] += jobs.pop(0) 

    # Insert this into min_heap 
    heap_insert(threads, next_thread) 

for i in range(len(record)): 
    print(str(record[i][0]) + ' ' + str(record[i][1]))

来源

2016-12-16 18:18:45 Daniel

优先级队列：并行处理

回答

相关问题