2016-12-24 59 views
2

有关(x,y)点的列表,我试图找到每个点的附近点。如何索引点列表以加快搜索附近的点?

from collections import defaultdict 
from math import sqrt 
from random import randint 

# Generate a list of random (x, y) points 
points = [(randint(0, 100), randint(0, 100)) for _ in range(1000)] 

def is_nearby(point_a, point_b, max_distance=5): 
    """Two points are nearby if their Euclidean distance is less than max_distance""" 
    distance = sqrt((point_b[0] - point_a[0])**2 + (point_b[1] - point_a[1])**2) 
    return distance < max_distance 

# For each point, find nearby points that are within a radius of 5 
nearby_points = defaultdict(list) 
for point in points: 
    for neighbour in points: 
     if point != neighbour: 
      if is_nearby(point, neighbour): 
       nearby_points[point].append(neighbour) 

有没有什么办法可以索引points使上述搜索更快?我觉得必须有一些比O更快的方式(len(points)** 2)。

编辑:一般点可浮动,不只是INTS

+0

如果你的网格只有100 * 100,你可以在网格中排列你的点。这样你可以大大减少搜索空间。 –

+0

http://gis.stackexchange.com/questions/22082/how-can-i-use-r-tree-to-find-points-within-a-distance-in-spatialite –

回答

1

这是一个固定的网格,每个网格点认为是存在的样本数量版本。

然后可以将搜索缩小到相关点周围的空间。

from random import randint 
import math 

N = 100 
N_SAMPLES = 1000 

# create the grid 
grd = [[0 for _ in range(N)] for __ in range(N)] 

# set the number of points at a given gridpoint 
for _ in range(N_SAMPLES): 
    grd[randint(0, 99)][randint(0, 99)] += 1 

def find_neighbours(grid, point, distance): 

    # this will be: (x, y): number of points there 
    points = {} 

    for x in range(point[0]-distance, point[0]+distance): 
     if x < 0 or x > N-1: 
      continue 
     for y in range(point[1]-distance, point[1]+distance): 
      if y < 0 or y > N-1: 
       continue 
      dst = math.hypot(point[0]-x, point[1]-y) 
      if dst > distance: 
       continue 
      if grd[x][y] > 0: 
       points[(x, y)] = grd[x][y] 
    return points 

print(find_neighbours(grid=grd, point=(45, 36), distance=5)) 
# -> {(44, 37): 1, (45, 33): 1, ...} 
# meadning: there is one neighbour at (44, 37) etc... 

进一步optimzation:用于xy测试可以预先计算对于给定gridsize - 在math.hypot(point[0]-x, point[1]-y)就不必再为完成每个点。

并且用numpy阵列替换网格可能是个好主意。


UPDATE

如果你的观点是float是你还可以创建一个int电网以减少搜索空间:

from random import uniform 
from collections import defaultdict 
import math 

class Point: 
    def __init__(self, x, y): 
     self.x = x 
     self.y = y 

    @property 
    def x_int(self): 
     return int(self.x) 

    @property 
    def y_int(self): 
     return int(self.y) 

    def __str__(self): 
     fmt = '''{0.__class__.__name__}(x={0.x:5.2f}, y={0.y:5.2f})''' 
     return fmt.format(self) 

N = 100 
MIN = 0 
MAX = N-1 

N_SAMPLES = 1000 


# create the grid 
grd = [[[] for _ in range(N)] for __ in range(N)] 

# set the number of points at a given gridpoint 
for _ in range(N_SAMPLES): 
    p = Point(x=uniform(MIN, MAX), y=uniform(MIN, MAX)) 
    grd[p.x_int][p.y_int].append(p) 


def find_neighbours(grid, point, distance): 

    # this will be: (x_int, y_int): list of points 
    points = defaultdict(list) 

    # need to cast a slightly bigger net on the upper end of the range; 
    # int() rounds down 
    for x in range(point[0]-distance, point[0]+distance+1): 
     if x < 0 or x > N-1: 
      continue 
     for y in range(point[1]-distance, point[1]+distance+1): 
      if y < 0 or y > N-1: 
       continue 
      dst = math.hypot(point[0]-x, point[1]-y) 
      if dst > distance + 1: # account for rounding... is +1 enough? 
       continue 
      for pt in grd[x][y]: 
       if math.hypot(pt.x-x, pt.y-y) <= distance: 
        points[(x, y)].append(pt) 
    return points 

res = find_neighbours(grid=grd, point=(45, 36), distance=5) 

for int_point, points in res.items(): 
    print(int_point) 
    for point in points: 
     print(' ', point) 

输出看起来是这样的:

(44, 36) 
    Point(x=44.03, y=36.93) 
(41, 36) 
    Point(x=41.91, y=36.55) 
    Point(x=41.73, y=36.53) 
    Point(x=41.56, y=36.88) 
... 

为了方便Points现在是一类。可能没有必要,但...这取决于你如何密集或稀疏点

你也可以代表网格为指向列表或Points字典...

find_neighbours函数接受一个开始仅在该版本中由int组成的point。这也可能会被改进。

还有很大的改进空间:y轴的范围可以使用三角法进行限制。而对于圈内的分数方式,则不需要单独检查;详细的检查只需要靠近圆圈的外缘完成。

+0

谢谢 - 如果点是浮动而不是整数?这种方法只适用于我们将浮点数转为整数 – mchen

+0

我认为调整上述方法是可行的。在bisect_left((点[0] +/-距离,点[1] +/-距离),点)之间搜索而不是在固定网格上搜索, – mchen