2017-10-17 135 views
0

我想重现一个教程看到 here问题与机器学习scikit在Python学习

一切工作完美,直到我用我的训练集添加.fit方法。

这里是我的代码示例:

# TRAINING PART 

train_dir = 'pdf/learning_set' 
dictionary = make_dic(train_dir) 

train_labels = np.zeros(20) 
train_labels[17:20] = 1 
train_matrix = extract_features(train_dir) 
model1 = MultinomialNB() 
model1.fit(train_matrix, train_labels) 


# TESTING PART 

test_dir = 'pdf/testing_set' 
test_matrix = extract_features(test_dir) 
test_labels = np.zeros(8) 
test_labels[4:7] = 1 
result1 = model1.predict(test_matrix) 
print(confusion_matrix(test_labels, result1)) 

这里是我的回溯:

Traceback (most recent call last): 
File "ML.py", line 65, in <module> 
model1.fit(train_matrix, train_labels) 
File "/usr/local/lib/python3.6/site-packages/sklearn/naive_bayes.py", 
line 579, in fit 
X, y = check_X_y(X, y, 'csr') 
File "/usr/local/lib/python3.6/site- 
packages/sklearn/utils/validation.py", line 552, in check_X_y 
check_consistent_length(X, y) 
File "/usr/local/lib/python3.6/site- 
packages/sklearn/utils/validation.py", line 173, in 
check_consistent_length 
" samples: %r" % [int(l) for l in lengths]) 
ValueError: Found input variables with inconsistent numbers of 
samples: [23, 20] 

我想知道我怎样才能解决这个问题呢? 我正在使用python 3.6在Ubuntu 16.04上工作。

回答

1

ValueError异常:与 样本不一致数实测值输入变量:[23,20]

这意味着你有23个训练向量(train_matrix具有23行) 但只有20个训练标签(train_labels是阵列20个值)

变化train_labels = np.zeros(20)train_labels = np.zeros(23) ,它应该工作。

+0

非常感谢,它工作完美!这是一个愚蠢的错误啊哈 –