2013-03-21 93 views
1

我试图将我的分类器结果从分类实例转换为0或1转换为分数(置信度?),例如0和10之间, 我正在使用RIDOR分类器,但也可以使用ClassificationViaRegression,RandomForest或AttributeSelectedClassifier,虽然它们分类不太好。将Weka分类器转换为分数

我到终端(所有的选项选中)输出尽我所能,但我不能在任何地方的预言找到了信心的措施。另外我明白这些都没有选择输出源代码?在这种情况下,我将不得不手动编码分类器。

这里是产生规则的例子:

class = 2 (40536.0/20268.0) 
     Except (fog <= 14.115114) and (polySyllabicWords/Sentence <= 1.973684) and (polySyllabicWords/Sentence <= 1.245) and (Characters/Word > 4.331715) => class = 1 (2309.0/5.0) [1137.0/4.0] 
     Except (fog <= 14.115598) and (polySyllabicWords/Sentence <= 1.973684) and (polySyllabicWords/Sentence > 1.514706) => class = 1 (2281.0/0.0) [1112.0/0.0] 
     Except (fog <= 14.136126) and (Words/Sentence > 19.651515) and (polySyllableCount <= 10.5) and (polySyllabicWords/Sentence > 2.416667) and (Syllables/Sentence <= 34.875) => class = 1 (601.0/0.0) [303.0/6.0] 
     Except (fog <= 14.140863) and (polySyllabicWords/Sentence <= 1.944444) and (polySyllableCount <= 4.5) and (polySyllabicWords/Sentence <= 1.416667) and (wordCount > 29.5) and (Characters/Word <= 4.83156) => class = 1 (333.0/0.0) [152.0/0.0] 
     Except (fog <= 14.142217) and (polySyllabicWords/Sentence <= 1.944444) and (polySyllableCount <= 4.5) and (polySyllabicWords/Sentence <= 1.416667) and (numOfChars > 30.5) and (Syllables/Word <= 1.474937) => class = 1 (322.0/0.0) [174.0/4.0] 
     Except (fog <= 14.140863) and (polySyllabicWords/Sentence <= 1.75) and (polySyllableCount <= 4.5) => class = 1 (580.0/28.0) [298.0/21.0] 
     Except (fog <= 14.141508) and (Syllables/Sentence > 25.585714) and (Words/Sentence > 19.683333) and (sentenceCount <= 4.5) and (polySyllabicWords/Sentence <= 2.291667) and (fog > 12.269468) => class = 1 (434.0/0.0) [202.0/0.0] 
     Except (fog <= 14.140863) and (Syllables/Sentence > 25.866071) and (polySyllableCount <= 16.5) and (fog > 12.793102) and (polySyllabicWords/Sentence <= 2.9) and (wordCount <= 59.5) and (Words/Sentence > 16.166667) and (Words/Sentence <= 24.75) => class = 1 (291.0/0.0) [166.0/0.0] 
     Except (fog <= 14.140863) and (Syllables/Sentence > 25.585714) and (Words/Sentence > 19.630682) and (polySyllabicWords/Sentence > 2.656863) and (polySyllableCount <= 16.5) and (fog > 13.560337) and (Words/Sentence <= 21.55) and (numOfChars <= 523) => class = 1 (209.0/0.0) [93.0/2.0] 
     Except (fog <= 14.147578) and (Syllables/Word <= 1.649029) and (polySyllabicWords/Sentence <= 1.75) and (polySyllabicWords/Sentence > 1.303846) and (polySyllabicWords/Sentence <= 1.422619) and (fog > 9.327132) => class = 1 (183.0/0.0) [64.0/0.0]...... 

我也不能确定第一行指(二万零三百六十八分之四万零五百三十六) - 这是否只是意味着把它归类为2,除非下列规则之一应用?

任何帮助非常感谢!

+1

是,第一行表示的默认分类应该是(2),除非下面的规则之一是真实的。 – etov 2013-03-21 14:24:31

回答

1

一般来说,从分类获得的信心不被视为一件容易的事,特别是如果你想它校准(例如表现为分类是正确的机会)。但是,有几种相对简单的方法可以获得粗略估计。

随着树和基于规则的分类,括号中的数字表示包含在桶正确/不正确的样本数量。因此,举例来说,具有(20,2)的桶意味着在该规则正确的情况下有20个情况,并且基于列车数据,有2个情况是不正确的。你可以用这个比例作为粗略的信心度量。

当使用的回归,你可以得到WEKA输出分类器(而不仅仅是类)和基础上对它的信任措施的实际数字结果。

更一般地,下面的文档,你可以使用称道线的-p选项(参见here)。但是,我不确定这些数字是如何计算的。