2017-02-28 66 views
2

我不知道,从哪里开始这个问题,因为我现在学习神经网络。我有一个带有句子>标签对的大数据库。例如:Python3文本标签

i want take a photo < photo 
i go to take a photo < photo 
i go to use my camera < photo 
i go to eat something < eat 
i like my food < eat 

如果用户写新句,我要检查所有的标签accurancy评分:

“我上床睡觉后,我用我的相机” <照片:0.9000,吃:0.4000 ,...

所以这个问题,我可以从哪里开始? Tensorflow和scikit学习的是看起来不错,但这个文件classificationt不显示精度:\

回答

1
import numpy as np 
from sklearn.linear_model import LogisticRegression 
from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn.preprocessing import LabelEncoder 
from sklearn import metrics 

sentences = ["i want take a photo", "i go to take a photo", "i go to use my camera", "i go to eat something", "i like my food"] 

labels = ["photo", "photo", "photo", "eat", "eat"] 

tfv = TfidfVectorizer() 

# Fit TFIDF 
tfv.fit(traindata) 
X = tfv.transform(traindata) 

lbl = LabelEncoder() 
y = lbl.fit_transform(labels) 

xtrain, xtest, ytrain, ytest = cross_validation.train_test_split(X, y, stratify=y, random_state=42) 

clf = LogisitcRegression() 
clf.fit(xtrain, ytrain) 
predictions = clf.predict(xtest) 

print "Accuracy Score = ", metrics.accuracy_score(ytest, predictions) 

新的数据:

new_sentence = ["this is a new sentence"] 
X_Test = tfv.transform(new_sentence) 
print clf.predict_proba(X_Test) 
+0

?好的,但我如何检查所有标签的新随机句子? – esemve

+0

查看最新的答案 –

+0

Thx很多,但是我的最后一个问题是:这是工作,但是如果我搜索测试现有句子,例如:“我去吃东西”,它回答:0.55 0.44,但是为什么?它的一个列车数据为吃饭类别:\第一个数字不是照片,第二个是吃饭类别?或者,如果不是,我可以得到什么数字是什么类别? – esemve