2014-04-01 232 views
2

我试图用scikit-learn版本0.14.1来计算tf-idf。这里是我的代码:__init __()得到了一个意想不到的关键字参数'stop_words'

from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.feature_extraction.text import TfidfTransformer 
from nltk.corpus import stopwords 
import numpy as np 
import numpy.linalg as LA 

train_set = ["The sky is blue.", "The sun is bright."] #Documents 
test_set = ["The sun in the sky is bright sun."] #Query 
stopWords = stopwords.words('english') 

vectorizer = CountVectorizer(stop_words = stopWords) 
#print vectorizer 
transformer = TfidfTransformer() 
#print transformer 

trainVectorizerArray = vectorizer.fit_transform(train_set).toarray() 
testVectorizerArray = vectorizer.transform(test_set).toarray() 
print 'Fit Vectorizer to train set', trainVectorizerArray 
print 'Transform Vectorizer to test set', testVectorizerArray 

transformer.fit(trainVectorizerArray) 
print 
print transformer.transform(trainVectorizerArray).toarray() 

transformer.fit(testVectorizerArray) 
print 
tfidf = transformer.transform(testVectorizerArray) 
print tfidf.todense() 

我得到这个错误:

Traceback (most recent call last): 
File "tfidf.py", line 12, in <module> 
vectorizer = CountVectorizer(stop_words = stopWords) 
TypeError: __init__() got an unexpected keyword argument 'stop_words' 

我不明白什么是“STOP_WORDS”的问题,需要帮助吗?

回答

2

所以错误是我的,我跟着一个在线教程安装sklearn它让我版本0.10。根据错误,我认为stop_words在sklearn 0.10版本中不受支持。 更新到版本0.14.1后,它工作正常!

相关问题