0
我有一个8 GB的RAM和英特尔酷睿I5处理器的联想IdeaPad笔记本电脑。我每100个维度有60k个数据点。我想做KNN,为此我运行LMNN算法来查找Mahalanobis度量标准。
问题是在我的Ubuntu上运行一个空白屏幕2小时后出现。我没有得到什么问题!我的记忆变满了还是别的什么?
那么有什么方法可以优化我的代码?
我的数据集:data
我LMNN实现:如何在平庸的笔记本电脑上成功运行带中等大小数据集的ML算法?
import numpy as np
import sys
from modshogun import LMNN, RealFeatures, MulticlassLabels
from sklearn.datasets import load_svmlight_file
def main():
# Get training file name from the command line
traindatafile = sys.argv[1]
# The training file is in libSVM format
tr_data = load_svmlight_file(traindatafile);
Xtr = tr_data[0].toarray(); # Converts sparse matrices to dense
Ytr = tr_data[1]; # The trainig labels
# Cast data to Shogun format to work with LMNN
features = RealFeatures(Xtr.T)
labels = MulticlassLabels(Ytr.astype(np.float64))
# Number of target neighbours per example - tune this using validation
k = 18
# Initialize the LMNN package
lmnn = LMNN(features, labels, k)
init_transform = np.eye(Xtr.shape[1])
# Choose an appropriate timeout
lmnn.set_maxiter(200000)
lmnn.train(init_transform)
# Let LMNN do its magic and return a linear transformation
# corresponding to the Mahalanobis metric it has learnt
L = lmnn.get_linear_transform()
M = np.matrix(np.dot(L.T, L))
# Save the model for use in testing phase
# Warning: do not change this file name
np.save("model.npy", M)
if __name__ == '__main__':
main()
你确定你得到一个较小的结果? – Julien
是的,我得到的结果较小。即具有大约0.5K的数据点而不是60K。 – Fenil
很明显,它需要更多的处理器和RAM来处理数据,这不是代码问题。 –