2015-09-25 582 views
7

我在3D空间中有一组点,我需要从中找到帕累托边界。执行速度在这里非常重要,并且时间增加非常快,因为我添加了测试点。快速计算Python中的Pareto前沿

的点的集合是这样的:

[[0.3296170319979843, 0.0, 0.44472108843537406], [0.3296170319979843,0.0, 0.44472108843537406], [0.32920760896951373, 0.0, 0.4440408163265306], [0.32920760896951373, 0.0, 0.4440408163265306], [0.33815192743764166, 0.0, 0.44356462585034007]] 

现在,我使用这个算法:

def dominates(row, candidateRow): 
    return sum([row[x] >= candidateRow[x] for x in range(len(row))]) == len(row) 

def simple_cull(inputPoints, dominates): 
    paretoPoints = set() 
    candidateRowNr = 0 
    dominatedPoints = set() 
    while True: 
     candidateRow = inputPoints[candidateRowNr] 
     inputPoints.remove(candidateRow) 
     rowNr = 0 
     nonDominated = True 
     while len(inputPoints) != 0 and rowNr < len(inputPoints): 
      row = inputPoints[rowNr] 
      if dominates(candidateRow, row): 
       # If it is worse on all features remove the row from the array 
       inputPoints.remove(row) 
       dominatedPoints.add(tuple(row)) 
      elif dominates(row, candidateRow): 
       nonDominated = False 
       dominatedPoints.add(tuple(candidateRow)) 
       rowNr += 1 
      else: 
       rowNr += 1 

     if nonDominated: 
      # add the non-dominated point to the Pareto frontier 
      paretoPoints.add(tuple(candidateRow)) 

     if len(inputPoints) == 0: 
      break 
    return paretoPoints, dominatedPoints 

这里找到:http://code.activestate.com/recipes/578287-multidimensional-pareto-front/

什么是找到的最快方法一组解决方案中的非主导解决方案?或者,简而言之,Python可以比这个算法做得更好吗?

回答

6

如果你担心实际速度,你一定要使用numpy的(因为聪明的算法调整可能具有比涨幅影响较小的方式被使用数组操作了)。这里有两个解决方案。的 “哑” 的解决方案是在大多数情况下速度慢,但随着成本的数量增加更快:

import numpy as np 


def is_pareto_efficient_dumb(costs): 
    """ 
    :param costs: An (n_points, n_costs) array 
    :return: A (n_points,) boolean array, indicating whether each point is Pareto efficient 
    """ 
    is_efficient = np.ones(costs.shape[0], dtype = bool) 
    for i, c in enumerate(costs): 
     is_efficient[i] = np.all(np.any(costs>=c, axis=1)) 
    return is_efficient 


def is_pareto_efficient(costs): 
    """ 
    :param costs: An (n_points, n_costs) array 
    :return: A (n_points,) boolean array, indicating whether each point is Pareto efficient 
    """ 
    is_efficient = np.ones(costs.shape[0], dtype = bool) 
    for i, c in enumerate(costs): 
     if is_efficient[i]: 
      is_efficient[is_efficient] = np.any(costs[is_efficient]<=c, axis=1) # Remove dominated points 
    return is_efficient 

仿形测试:

随着10000的样品,2项成本:

dumb: Elapsed time is 0.9168s 
smart: Elapsed time is 0.004274s 

随着5000样品,15费用:

dumb: Elapsed time is 1.394s 
smart: Elapsed time is 1.982s 
+1

哇,我错过了,谢谢彼得!我不确定我是否能够获得成本阵列,你能举一个简单的例子吗?再一次感谢,这看起来太棒了。 – Rodolphe

+1

成本数组只是一个二维数组,其中cost [i,j]是第j个我认为它和你的inputPoints数组是一样的,你可以看到[tests here](https://github.com/QUVA-Lab/artemis/blob/master/artemis/general/) test_pareto_efficiency.py),它演示了它的用法。 – Peter

5

我花了一些时间重写相同的算法与几个调整。我认为你的大部分问题来自inputPoints.remove(row)。这要求通过点列表搜索;按索引去除会更有效率。 我也修改了dominates函数以避免一些冗余的比较。这可以在更高维度上得心应手。

def dominates(row, rowCandidate): 
    return all(r >= rc for r, rc in zip(row, rowCandidate)) 

def cull(pts, dominates): 
    dominated = [] 
    cleared = [] 
    remaining = pts 
    while remaining: 
     candidate = remaining[0] 
     new_remaining = [] 
     for other in remaining[1:]: 
      [new_remaining, dominated][dominates(candidate, other)].append(other) 
     if not any(dominates(other, candidate) for other in new_remaining): 
      cleared.append(candidate) 
     else: 
      dominated.append(candidate) 
     remaining = new_remaining 
    return cleared, dominated 
+0

谢谢,我正在尝试。任何想法将如何比较这里的第一个答案:http://stackoverflow.com/questions/21294829/fast-calculations-of-the-pareto-front-in-r? – Rodolphe

+1

我不确定。我尝试了一些类似的解决方案,第一次尝试。对于每个维度,我按值排列点并获得索引对。取所有这些对的交集给出了所有的统治关系。然而,我无法让我的python代码运行得如此快。 –

1

dominates的定义不正确。当且仅当它在所有维度上优于或等于B,并且在至少一个维度上严格地更好时,A支配B.