0
我有一套我想要集群的wikipedia文本。k-means中的特征权重
的代码如下:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
#parameters
maximum_features = 1000000
max_intera = 300
#load text file
wiki = pd.read_csv('people_wiki.csv')
#TF-IDF vectorization
vectorizer = TfidfVectorizer(max_features=maximum_features, norm = 'l2', stop_words='english')
tfidf = vectorizer.fit_transform(wiki['text'])
#clustering
kmeans = KMeans(n_clusters=3, random_state=0, init='k-means++', max_iter = max_intera).fit(tfidf)
我想知道每个特征的权重,像这里显示(她0.025她:0.017 .....):
总结:我希望每个特征(单词)的权重和呈现5更相关。
文件 'people_wiki.csv' 是在这里: