1
以下是运行列车功能我的输出:插入符包方法= “treebag”
Bagged CART
1251 samples
30 predictors
2 classes: 'N', 'Y'
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 1247, 1247, 1247, 1247, 1247, 1247, ...
Resampling results
Accuracy Kappa Accuracy SD Kappa SD
0.806 0.572 0.0129 0.0263
这是我的混淆矩阵
Bootstrapped (25 reps) Confusion Matrix
(entries are percentages of table totals)
Reference
Prediction N Y
N 24.8 7.9
Y 11.5 55.8
分割数据集后 - 80%列车和20%测试,我训练模型,然后在测试分区上做一个“预测”,精度达到〜65%。
问题:
(1) Does this mean my model is not very good?
(2) Is 'treebag' the proper method since I only have 2 classes: 'N', 'Y' ? Would a Logistic Regression method be better?
(3) Finally, my 1251 samples are roughly 67% 'Y' and 33% 'N'. Could this be "skewing" my training/results? Do I need a ratio closer to 50 - 50?
任何帮助将不胜感激!