2017-01-16 39 views
1

我正在计算矢量元素之间的欧氏成对距离。我使用sklearn包中的pairwise_distances函数。然而,一些元素的结果矩阵仅仅是近似对称的:在一个例子中,应该相等的元素的值仅等于小数点后15位数。sklearn的成对距离结果意外不对称

我意识到这一点,因为我在假定输入矩阵对称的下游分析中出现错误。我知道我可以将价值整理起来,但是造成这种情况的原因是什么?

这里是我试图计算用于成对距离向量(它是一个大熊猫数据帧的一列):

lag_measure_data[['bios_level']].values 

array([[ 0.76881030949999995538490793478558771312236785888671875 ], 
    [ 0.              ], 
    [ 0.67783090619999997183953155399649403989315032958984375 ], 
    [ 0.3228176074999999922710003374959342181682586669921875 ], 
    [ 0.75822395549999999087020796650904230773448944091796875 ], 
    [ 0.469808621599999975959605080788605846464633941650390625], 
    [ 0.989529862699999984698706612107343971729278564453125 ], 
    [ 0.              ], 
    [ 0.5575436799999999859522858969285152852535247802734375 ], 
    [ 0.9756440299999999954394525047973729670047760009765625 ], 
    [ 0.66511863289999995085821637985645793378353118896484375 ], 
    [ 0.978062709200000046649847718072123825550079345703125 ], 
    [ 0.473957179800000016900440868994337506592273712158203125], 
    [ 0.82409385540000001935112550199846737086772918701171875 ], 
    [ 0.56548685279999999497846374651999212801456451416015625 ], 
    [ 0.399505730399999980928527065771049819886684417724609375], 
    [ 0.474232963900000026313819034839980304241180419921875 ], 
    [ 0.34276307189999999369689476225175894796848297119140625 ], 
    [ 0.9985316859999999739017084721126593649387359619140625 ], 
    [ 0.9063241512999999915933813099400140345096588134765625 ], 
    [ 0.              ]]) 

这里是我使用来获得距离矩阵的命令:

d_matrix_lag = pairwise_distances(lag_measure_data[['bios_level']].values) 

这里我就不打印输出距离矩阵,因为它是太乱了,但作为第一行中的例子为第4列的值是

0.445992701999999907602756366 031826473772525787353515625

,而第4行和第一列的值是

0.4459927019999998520916051347739994525909423828125

+0

什么是向量提供这样的距离? – Dmitry

+0

因此,如果您比较已打印的数组的第1个和第4个元素,则可以重现这些结果。 pairwise_distances(0.3228176074999999922710003374959342181682586669921875,0.76881030949999995538490793478558771312236785888671875) 缺货[700]:阵列([[0.4459927019999998520916051347739994525909423828125]]) pairwise_distances(0.76881030949999995538490793478558771312236785888671875,0.3228176074999999922710003374959342181682586669921875) 缺货[701]:阵列([[0.445992701999999907602756366031826473772525787353515625]]) – user277194

+0

在上述我的注释对两个花车进行两次计算,同时切换两个参数的位置,结果略有不同 – user277194

回答

3

我可以重现你的错误我的对称性测试:

import numpy as np 

a = np.array([[ 0.76881030949999995538490793478558771312236785888671875 ], 
    [ 0.              ], 
    [ 0.67783090619999997183953155399649403989315032958984375 ], 
    [ 0.3228176074999999922710003374959342181682586669921875 ], 
    [ 0.75822395549999999087020796650904230773448944091796875 ], 
    [ 0.469808621599999975959605080788605846464633941650390625], 
    [ 0.989529862699999984698706612107343971729278564453125 ], 
    [ 0.              ], 
    [ 0.5575436799999999859522858969285152852535247802734375 ], 
    [ 0.9756440299999999954394525047973729670047760009765625 ], 
    [ 0.66511863289999995085821637985645793378353118896484375 ], 
    [ 0.978062709200000046649847718072123825550079345703125 ], 
    [ 0.473957179800000016900440868994337506592273712158203125], 
    [ 0.82409385540000001935112550199846737086772918701171875 ], 
    [ 0.56548685279999999497846374651999212801456451416015625 ], 
    [ 0.399505730399999980928527065771049819886684417724609375], 
    [ 0.474232963900000026313819034839980304241180419921875 ], 
    [ 0.34276307189999999369689476225175894796848297119140625 ], 
    [ 0.9985316859999999739017084721126593649387359619140625 ], 
    [ 0.9063241512999999915933813099400140345096588134765625 ], 
    [ 0.              ]]) 

from sklearn.metrics.pairwise import pairwise_distances 
dist_sklearn = pairwise_distances(a) 
print((dist_sklearn.transpose() == dist_sklearn).all()) 

越来越假作为输出。尝试改用scipy.spatial.distance。 您将获得成对距离计算的距离向量,但可以将其与squareform转化为距离矩阵()

from scipy.spatial.distance import pdist, squareform 

dist = pdist(a) 
sq = squareform(dist) 
print((sq.transpose() == sq).all()) 

这给了我对称矩阵。 希望这有助于

+0

谢谢,这节省了我的一天,因为我对此感到头疼。出于某种原因,我甚至在距离矩阵四舍五入时遇到困难。我开始失去了我对python的信仰...... – user277194

+0

欢迎你。永远不要对蟒蛇失去信心;) – NKlink0r