from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
你好,我有意见如下列表:为什么下面的部分贴合不工作属性?
comments = ['I am very agry','this is not interesting','I am very happy']
这些相应的标签:
sents = ['angry','indiferent','happy']
我使用TFIDF如下向量化这些评论:
tfidf_vectorizer = TfidfVectorizer(analyzer='word')
tfidf = tfidf_vectorizer.fit_transform(comments)
from sklearn import preprocessing
我使用标签编码器来向量化标签:
le = preprocessing.LabelEncoder()
le.fit(sents)
labels = le.transform(sents)
print(labels.shape)
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.model_selection import train_test_split
with open('tfidf.pickle','wb') as idxf:
pickle.dump(tfidf, idxf, pickle.HIGHEST_PROTOCOL)
with open('tfidf_vectorizer.pickle','wb') as idxf:
pickle.dump(tfidf_vectorizer, idxf, pickle.HIGHEST_PROTOCOL)
这里我使用的被动攻击,以适应型号:
clf2 = PassiveAggressiveClassifier()
with open('passive.pickle','wb') as idxf:
pickle.dump(clf2, idxf, pickle.HIGHEST_PROTOCOL)
with open('passive.pickle', 'rb') as infile:
clf2 = pickle.load(infile)
with open('tfidf_vectorizer.pickle', 'rb') as infile:
tfidf_vectorizer = pickle.load(infile)
with open('tfidf.pickle', 'rb') as infile:
tfidf = pickle.load(infile)
在这里,我想测试部分配合使用有三个新的注释及其相应的标签如下:
new_comments = ['I love the life','I hate you','this is not important']
new_labels = [1,0,2]
vec_new_comments = tfidf_vectorizer.transform(new_comments)
print(clf2.predict(vec_new_comments))
clf2.partial_fit(vec_new_comments, new_labels)
问题是我没有得到正确的结果后部分适合如下:
print('AFTER THIS UPDATE THE RESULT SHOULD BE 1,0,2??')
print(clf2.predict(vec_new_comments))
但是我得到这样的输出:
[2 2 2]
所以我真的很感谢支持发现,为什么如果我用同样的例子,它已用来进行培训,测试它的模型没有被更新期望的输出应该是:
[1,0,2]
我想赞赏支持,也许超参数看到所需的输出。
这是完整的代码,以显示部分适合:
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
import sys
from sklearn.metrics.pairwise import cosine_similarity
import random
comments = ['I am very agry','this is not interesting','I am very happy']
sents = ['angry','indiferent','happy']
tfidf_vectorizer = TfidfVectorizer(analyzer='word')
tfidf = tfidf_vectorizer.fit_transform(comments)
#print(tfidf.shape)
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(sents)
labels = le.transform(sents)
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.model_selection import train_test_split
with open('tfidf.pickle','wb') as idxf:
pickle.dump(tfidf, idxf, pickle.HIGHEST_PROTOCOL)
with open('tfidf_vectorizer.pickle','wb') as idxf:
pickle.dump(tfidf_vectorizer, idxf, pickle.HIGHEST_PROTOCOL)
clf2 = PassiveAggressiveClassifier()
clf2.fit(tfidf, labels)
with open('passive.pickle','wb') as idxf:
pickle.dump(clf2, idxf, pickle.HIGHEST_PROTOCOL)
with open('passive.pickle', 'rb') as infile:
clf2 = pickle.load(infile)
with open('tfidf_vectorizer.pickle', 'rb') as infile:
tfidf_vectorizer = pickle.load(infile)
with open('tfidf.pickle', 'rb') as infile:
tfidf = pickle.load(infile)
new_comments = ['I love the life','I hate you','this is not important']
new_labels = [1,0,2]
vec_new_comments = tfidf_vectorizer.transform(new_comments)
clf2.partial_fit(vec_new_comments, new_labels)
print('AFTER THIS UPDATE THE RESULT SHOULD BE 1,0,2??')
print(clf2.predict(vec_new_comments))
不过我:
AFTER THIS UPDATE THE RESULT SHOULD BE 1,0,2??
[2 2 2]
你是如何装修的'clf2'。请将整个代码作为一个代码片段发布。现在它非常恼人的复制粘贴一次又一次。 –
@VivekKumar我已经更新了这个问题,我添加了完整的代码来重现我的问题,感谢支持 – neo33