我试图实现我自己的kNN分类器。我已经成功地实现了某些东西,但它是慢得令人难以置信......使用numpy的k最近邻分类器
def euclidean_distance(X_train, X_test):
"""
Create list of all euclidean distances between the given
feature vector and all other feature vectors in the training set
"""
return [np.linalg.norm(X - X_test) for X in X_train]
def k_nearest(X, Y, k):
"""
Get the indices of the nearest feature vectors and return a
list of their classes
"""
idx = np.argpartition(X, k)
return np.take(Y, idx[:k])
def predict(X_test):
"""
For each feature vector get its predicted class
"""
distance_list = [euclidean_distance(X_train, X) for X in X_test]
return np.array([Counter(k_nearest(distances, Y_train, k)).most_common()[0][0] for distances in distance_list])
其中(例如)
X = [[ 1.96701284 6.05526865]
[ 1.43021202 9.17058291]]
Y = [ 1. 0.]
显然,这将是更快,如果我没有使用任何的循环,但我不知道如何让它在没有它们的情况下工作。有没有办法可以做到这一点,而不使用循环/列表解析?
什么'X_train'? – Divakar
@Divakar将'X'分成训练集和测试集。想象一下,“X”实际上是200行'x,y'值,而不是2行。然后它被分成'X_train'和'X_test'。 – user5368737