2017-04-09 88 views
0

好吧,我正在为k-NN方法中的两个文档做汉明距离。我试图将Matlab代码翻译成Python,但我一直在看它几个小时,不知道是什么导致了错误。Hamming距离Matlab到Python

代码在Matlab:

function [ Dist ] = hamming_distance(X,Xtrain) 
% Function calculates Hamming distances of elements in set X from elements in set Xtrain. Distances of objects are returned as matrix Dist 
% X - set of objects we are comparing N1xD 
% Xtrain - set of objects to which X objects are compared N2xD 
% Dist - matrix of distances between X and Xtrain objects N1xN2 
% N1 - number of elements in X 
% N2 - number of elements in Xtrain 
% D - number of features (key words) 

N1 = size(X,1); 
N2 = size(Xtrain,1); 
Dist = zeros(N1,N2); 
D1 = size(X,2); 
for i=1:N1 
    for j=1:N2 
     temp_matrix = xor(X(i,1:D1),Xtrain(j,1:D1)); 
     Dist(i,j) = sum(temp_matrix); 
    end 
end 
end 

这是我在Python写至今:

def hamming_distance(X, X_train): 
    """ 
    :param X: set of objects that are going to be compared N1xD 
    :param X_train: set of objects compared against param X N2xD 
    Functions calculates Hamming distances between all objects from set X and all object from set X_train. 
    Resulting distances are returned as matrices. 
    :return: Distance matrix between objects X and X_train X i X_train N1xN2 
    """ 
    N1 = X.shape[0] 
    N2 = X_train.shape[0] 
    hdist = np.zeros(shape =(N1, N2)) 
    D1 = X.shape[1] 
    for i in range (1,N1): 
     for j in range (1, N2): 
      temp_matrix = np.logical_xor(X[i,1:D1], X_train[j, 1:D1]) 
      hdist[i, j] = np.sum(temp_matrix) 
    return hdist 

的错误似乎是在Python代码的XOR一部分。我不明白那里有什么可能是错的;我试图把它作为(X[i,1:D1])^(X_train[j, 1:D1]),但它没有改变任何东西。我检查了logical_xor函数,看起来我有正确的函数输入。我不明白错误来自哪里。这可能是因为矩阵的形状不同吗?我在调整它们的大小时感到困惑,我应该将X和X_train更改为数组吗?我尝试过一次,但没有任何帮助。

错误:

Traceback (most recent call last): 
    File "C:\...\test.py", line 90, in test_hamming_distance 
    out = hamming_distance(data['X'], data['X_train']) 
    File "C:\...\content.py", line 28, in hamming_distance 
    temp_matrix = np.logical_xor(X[i,1:D1], X_train[j, 1:D1]) 
    File "C:\...\Anaconda3\lib\site-packages\scipy\sparse\base.py", line 559, in __getattr__ 
    raise AttributeError(attr + " not found") 
AttributeError: logical_xor not found 

我不能改变test.py,只有content.py。 Test.py应该工作正常,所以我确信我的函数有一个错误。任何帮助,将不胜感激!

编辑: 我,对我的文件的顶部:

import numpy as np 

写作numpy的,而不是NP没有任何改变。我收到一个错误'numpy wasn't defined'

+0

该功能在Numpy中不存在。这就是所有你的错误说 –

+0

但是?有一个函数numpy.logical_xor。我不明白。我应该换个角度吗?我的文件中有np进口np。我应该工作吗? – Swaglina

+1

显示你定义'np'的代码。这是标准的进口'numpy进口np'吗?你无意中重复使用了'np'这个名字吗? –

回答

2

这不起作用的原因是因为XX_train是scipy稀疏矩阵。 Scipy稀疏矩阵不支持逻辑运算,尽管对此的工作是in-progress

当您调用numpy函数时,此错误在scipy中显示而不是numpy的原因是logical_xor是numpy ufunc或“通用函数”。用于与numpy ufuncs交互的Python类可以覆盖ufuncs的行为,并且scipy稀疏矩阵可以避免调用不支持的操作,这些操作会将数组转换为密集数组并可能会消耗掉所有内存。您需要使用例如X.toarray()将其转换为密集数组。如果它太大而不适合内存,则应该使用像daskbcolz这样的包来处理您的内存管理。

编辑:scipy稀疏矩阵不是ndarray的子​​类。

+0

啊,这很有道理,+1。所以问题是numpy试图将'logical_xor'的调用分派给其参数上的方法,但是scipy的稀疏矩阵没有这种方法。如果numpy在这种情况下生成了更有用的错误消息,那将是很好的事情。 –

+0

@WarrenWeckesser:Fixed – TheBlackCat