如何从决策树计算错误率？

有谁知道如何计算R的决策树的错误率？我正在使用rpart()函数。如何从决策树计算错误率？

2012-03-12 teo6389

假设您的意思是用于拟合模型的样本的计算误差率，则可以使用printcp()。例如，使用在线例如，

> library(rpart) 
> fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis) 
> printcp(fit) 

Classification tree: 
rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis) 

Variables actually used in tree construction: 
[1] Age Start 

Root node error: 17/81 = 0.20988 

n= 81 

     CP nsplit rel error xerror xstd 
1 0.176471  0 1.00000 1.00000 0.21559 
2 0.019608  1 0.82353 0.82353 0.20018 
3 0.010000  4 0.76471 0.82353 0.20018

的Root node error用于计算的预测性能两种措施，考虑当在rel error和xerror列中显示的值，并且取决于复杂性参数（第一列）：

0.76471 X 0.20988 = 0.1604973（16.0％）是resubstitution差错率（即，计算的训练样本错误率） - 这是大致
```
class.pred <- table(predict(fit, type="class"), kyphosis$Kyphosis) 
1-sum(diag(class.pred))/sum(class.pred) 
```
0.82353 X 0.20988 = 0.1728425（17.2％）是交叉验证误差率（使用10倍CV，参见rpart.control()xval;但也参见依赖于这种措施的xpred.rpart()和plotcp()）。这项措施是预测准确性的更客观指标。

请注意，这是从tree与分类精度协议或多或少：

> library(tree) 
> summary(tree(Kyphosis ~ Age + Number + Start, data=kyphosis)) 

Classification tree: 
tree(formula = Kyphosis ~ Age + Number + Start, data = kyphosis) 
Number of terminal nodes: 10 
Residual mean deviance: 0.5809 = 41.24/71 
Misclassification error rate: 0.1235 = 10/81

其中Misclassification error rate从训练样本计算。

来源

2012-03-12 12:35:57 chl

如何从决策树计算错误率？

回答

相关问题