2015-11-04 84 views
1

我在sklearn中使用了混淆矩阵。每行是哪个标签在混淆矩阵中python

我的问题是,我无法理解每一行是哪个标签!我的标签是[0, 1, 2, 3, 4, 5]

我想知道如果第一行是标签0,第二行是标签1等?

为了确保,我试过这个代码,我认为通过标签的顺序来制作混淆矩阵。但是我得到了一个错误。

cfr = RandomForestClassifier(n_estimators = 80, n_jobs = 5) 
cfr.fit(X1, y1) 
predictedY2 = cfr.predict(X2) 
shape = np.array([0, 1, 2, 3, 4, 5]) 
acc1 = cfr.score(X2, y2,shape) 

错误是:

acc1 = cfr.score(X2, y2,shape) 
TypeError: score() takes exactly 3 arguments (4 given)` 
+0

“crf.score”的文档是什么? – hpaulj

+0

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.score – Talia

+0

4个参数是自我和3个明确的。我会尝试在最后使用关键字。并没有检查版本。关键字参数可能是最近的增加。我没有这个包,所以无法检查我自己。尽管我可以探索github代码。 – hpaulj

回答

0

score给出了分类器的精确度,即每数的实例数正确地预测。你正在寻找的是predict函数,该函数产生为每个输入预测的类别。检查出这个例子:

import numpy as np 
from sklearn.ensemble import RandomForestClassifier as RFC 
from sklearn.metrics import confusion_matrix 
from sklearn.datasets import make_classification 

# Add a random state to the various functions so we all have the same output. 
rng = np.random.RandomState(1234) 

# Make dataset 
X,Y = make_classification(n_samples=1000, n_classes=6, n_features=20, n_informative=15, random_state=rng) 
# take random 75% of data as training, leaving rest for test 
train_inds = rng.rand(1000) < 0.75 

# create and train the classifier 
rfc = RFC(n_estimators=80, random_state=rng) 
rfc.fit(X[train_inds], Y[train_inds]) 

# O is the predicted class for each input on the test data 
O = rfc.predict(X[~train_inds]) 

print "Test accuracy: %.2f%%\n" % (rfc.score(X[~train_inds],Y[~train_inds])*100) 

print "Confusion matrix:" 
print confusion_matrix(Y[~train_inds], O) 

此打印:

Test accuracy: 57.92% 

Confusion matrix: 
[[24 4 3 1 1 6] 
[ 5 22 4 4 1 1] 
[ 5 2 18 5 3 2] 
[ 2 4 2 29 1 4] 
[ 3 1 3 2 28 3] 
[10 4 4 3 8 18]] 

confusion_matrix文档,混淆矩阵的i,j组分是已知i类的但分类为对象的数目j。因此,在上面,正确分类的对象在对角线上,但是如果您看第3行第0列,则看起来像两个“第3类”对象被错误分类为“第0类”对象。

希望这会有所帮助!