从所有其他点计算矩阵中的一点之间的距离

我是新来的Python，我需要实现一个聚类算法。为此，我需要计算给定输入数据之间的距离。从所有其他点计算矩阵中的一点之间的距离

考虑下面的输入数据 -

[[1,2,8], 
    [7,4,2], 
    [9,1,7], 
    [0,1,5], 
    [6,4,3]]

什么我期待在这里实现，我想计算的距离[1,2,8]从所有其他点，并找到一个地步距离最小。

我必须对所有其他点重复这一点。

我想用FOR循环来实现这一点，但我确信SciPy/NumPy必须有一个函数可以帮助我有效地实现这个结果。

我在网上查看，但'pdist'命令无法完成我的工作。

有人可以指导我吗？

TIA

来源

2017-10-12 Adhish Thite

使用np.linalg.norm与广播（numpy的外减法）相结合，你可以这样做：

np.linalg.norm(a - a[:,None], axis=-1)

a[:,None]插入一个新的轴为a，然后a - a[:,None]将根据行减法做一排由于广播。 np.linalg.norm计算np.sqrt(np.sum(np.square(...)))过去轴：

a = np.array([[1,2,8], 
    [7,4,2], 
    [9,1,7], 
    [0,1,5], 
    [6,4,3]]) 

np.linalg.norm(a - a[:,None], axis=-1) 
#array([[ 0.  , 8.71779789, 8.1240384 , 3.31662479, 7.34846923], 
#  [ 8.71779789, 0.  , 6.164414 , 8.18535277, 1.41421356], 
#  [ 8.1240384 , 6.164414 , 0.  , 9.21954446, 5.83095189], 
#  [ 3.31662479, 8.18535277, 9.21954446, 0.  , 7.  ], 
#  [ 7.34846923, 1.41421356, 5.83095189, 7.  , 0.  ]])

元素[0,1]，[0,2]例如对应于：分别

np.sqrt(np.sum((a[0] - a[1]) ** 2)) 
# 8.717797887081348 

np.sqrt(np.sum((a[0] - a[2]) ** 2)) 
# 8.1240384046359608

。

来源

2017-10-12 02:17:17 Psidom

感谢您的回答！它效果很好。还有一个问题。要找到点之间的最小距离，我将不得不消除每行的'0'并找到最小值。但是，如果出现同一个点不止一次，那么我必须将它看作两个不同的点。所以，我必须减少a [i，i]，因为它将为零，但我必须利用其他'0'。任何想法如何我可以实现这一目标？ –

快速解决方法是用'np.nan'替换所有对角线，然后使用'np.nanmin'或'np.nanargmin'：'dist = np.linalg.norm（a - a [：，None]，axis = -1）; dist [np.arange（dist.shape [0]），np.arange（dist.shape [0]）] = np.nan; np.nanargmin（dist，axis = 0）' – Psidom

From this thread's 您可以使用e_dist函数那里并且也获得相同的结果。

附录

定时：在我的记忆饿死的笔记本电脑，我只能做一个比较，以@Psidom的使用他norm_app功能比较小的样本。 7个运行的

一个= np.random.randint（0,9，（5000,3））

％timeit norm_app的（a） 1.91小号±每个环路13.5毫秒（平均±标准。dev的。，每个循环1圈）

％timeit e_dist（a，a） 631 ms±3.64 ms per loop（mean±std.dev。7点运行，1环的每个）的

a 
array([[1, 2, 8], 
     [7, 4, 2], 
     [9, 1, 7], 
     [0, 1, 5], 
     [6, 4, 3]]) 

dm = e_dist(a, a) # get the def from the link 

dm 
Out[7]: 
array([[ 0. , 8.72, 8.12, 3.32, 7.35], 
     [ 8.72, 0. , 6.16, 8.19, 1.41], 
     [ 8.12, 6.16, 0. , 9.22, 5.83], 
     [ 3.32, 8.19, 9.22, 0. , 7. ], 
     [ 7.35, 1.41, 5.83, 7. , 0. ]]) 

idx = np.argsort(dm) 

closest = a[idx] 

closest 
Out[10]: 
array([[[1, 2, 8], 
     [0, 1, 5], 
     [6, 4, 3], 
     [9, 1, 7], 
     [7, 4, 2]], 

     [[7, 4, 2], 
     [6, 4, 3], 
     [9, 1, 7], 
     [0, 1, 5], 
     [1, 2, 8]], 

     [[9, 1, 7], 
     [6, 4, 3], 
     [7, 4, 2], 
     [1, 2, 8], 
     [0, 1, 5]], 

     [[0, 1, 5], 
     [1, 2, 8], 
     [6, 4, 3], 
     [7, 4, 2], 
     [9, 1, 7]], 

     [[6, 4, 3], 
     [7, 4, 2], 
     [9, 1, 7], 
     [0, 1, 5], 
     [1, 2, 8]]])

来源

2017-10-12 02:35:56 NaN

我没有从你的'nearest'得到预期的结果，因为我认为输出的形状与输入相同 - 每个点有一个最接近的点。所以，我不能在你的时间结果中包括你的。另外，'norm_app'是Psidom的。 – Divakar

更正了名称，谢谢，我选择了排序点而不是结果，以防万一它是需要的点 – NaN

下面是使用SciPy's cdist一个方法 -

from scipy.spatial.distance import cdist 
def closest_rows(a): 
    # Get euclidean distances as 2D array 
    dists = cdist(a, a, 'sqeuclidean') 

    # Fill diagonals with something greater than all elements as we intend 
    # to get argmin indices later on and then index into input array with those 
    # indices to get the closest rows 
    dists.ravel()[::dists.shape[1]+1] = dists.max()+1 
    return a[dists.argmin(1)]

采样运行 -

In [72]: a 
Out[72]: 
array([[1, 2, 8], 
     [7, 4, 2], 
     [9, 1, 7], 
     [0, 1, 5], 
     [6, 4, 3]]) 

In [73]: closest_rows(a) 
Out[73]: 
array([[0, 1, 5], 
     [6, 4, 3], 
     [6, 4, 3], 
     [1, 2, 8], 
     [7, 4, 2]])

运行测试

其他工作方法（ es） -

个

def norm_app(a): # @Psidom's soln 
    dist = np.linalg.norm(a - a[:,None], axis=-1); 
    dist[np.arange(dist.shape[0]), np.arange(dist.shape[0])] = np.nan 
    return a[np.nanargmin(dist, axis=0)]

计时与10,000点 -

In [79]: a = np.random.randint(0,9,(10000,3)) 

In [80]: %timeit norm_app(a) # @Psidom's soln 
1 loop, best of 3: 3.83 s per loop 

In [81]: %timeit closest_rows(a) 
1 loop, best of 3: 392 ms per loop

进一步的性能提升

有eucl_dist包（免责声明：我是它的作者），它包含各种方法来计算欧几里得距离是比SciPy's cdist效率更高，特别是对于大型阵列。

因此，利用它，我们将有一个更好的性能之一，像这样 -

from eucl_dist.cpu_dist import dist 
def closest_rows_v2(a): 
    dists = dist(a,a, matmul="gemm", method="ext") 
    dists.ravel()[::dists.shape[1]+1] = dists.max()+1 
    return a[dists.argmin(1)]

计时 -

In [162]: a = np.random.randint(0,9,(10000,3)) 

In [163]: %timeit closest_rows(a) 
1 loop, best of 3: 394 ms per loop 

In [164]: %timeit closest_rows_v2(a) 
1 loop, best of 3: 229 ms per loop

来源

2017-10-12 05:37:56 Divakar

我建议使用pdist和squareform从scipy.spatial.distance

考虑以下一系列要点：

a = np.array([[1,2,8], [7,4,2], [9,1,7], [0,1,5], [6,4,3]])

如果你想显示所有的距离点[1,2,8]和其他点之间：

squareform(pdist(a)) 

Out[1]: array([[ 0.  , 8.71779789, 8.1240384 , 3.31662479, 7.34846923], 
       [ 8.71779789, 0.  , 6.164414 , 8.18535277, 1.41421356], 
       [ 8.1240384 , 6.164414 , 0.  , 9.21954446, 5.83095189], 
       [ 3.31662479, 8.18535277, 9.21954446, 0.  , 7.  ], 
       [ 7.34846923, 1.41421356, 5.83095189, 7.  , 0.  ]])

我要显示点[1,2,8]之间的最短距离和最近点：

sorted(squareform(pdist(a))[0])[1] 

Out[2]: 3.3166247903553998

[0]是您的第一个要点的索引（[1,2,8]）

[1]是第二最小值的指数（以避免零）

如果你想显示指数的最近点的到[1,2,8]：

np.argsort(squareform(pdist(a))[0])[1] 

Out[3]: 3

来源

2018-01-05 11:14:38 solub

从所有其他点计算矩阵中的一点之间的距离

回答

相关问题