从Scikit学习使用Python MultinomialNB()
,我想不仅在文档Word功能,而且在情绪词典(意思是只字未列出的Python数据类型)的文件进行分类。添加功能,多项朴素贝叶斯分类器 - Python的
假设这些文件,以培养
train_data = ['i hate who you welcome for','i adore him with all my heart','i can not forget his warmest welcome for me','please forget all these things! this house smells really weird','his experience helps a lot to complete all these tedious things', 'just ok', 'nothing+special today']
train_labels = ['Nega','Posi','Posi','Nega','Posi','Other','Other']
psentidict = ['welcome','adore','helps','complete','fantastic']
nsentidict = ['hate','weird','tedious','forget','abhor']
osentidict = ['ok','nothing+special']
我可以通过所有令牌的计算根据相应的标签训练下方
from sklearn import naive_bayes
from sklearn.pipeline import Pipeline
text_clf = Pipeline([('vect', CountVectorizer()),
('clf', naive_bayes.MultinomialNB(alpha = 1.0)),])
text_clf = text_clf.fit(train_data, train_labels)
喜欢这些名单虽然我训练中的数据,我想将我的情感字典用作额外的分类功能。
这是因为通过词典训练的特征,可以预测OOV(超出词汇量)。只有笨拙的拉普拉斯平滑(alpha = 1.0)
,整体精度将受到严重限制。
test_data = 'it is fantastic'
predicted_labels = text_clf.predict(test_data)
随着字典功能的增加,可以预测上面的句子,尽管每一个单词都不在训练文档中。
如何将psentidict
,nsentidict
和osentidict
的特征添加到Multinomial朴素贝叶斯分类器?