文档聚类和可视化

我想测试一组文档是否有一些特殊的相似性，查看与每个人的向量表示一起构建的图形，以及其他文档的文本数据集。我想他们会一起在一个可视化。文档聚类和可视化

解决方法是使用doc2vec来计算每个文档的矢量并绘制它？它能以无人监督的方式完成吗？我应该使用哪个Python库来获得Word2vec的美丽2D和3D表示？

2017-07-27 Willy Cornelissen

不确定你在问什么，但是如果你想要一种方法来检查vector是否属于同一类型，你可以使用K-Means。 K-Means从矢量列表中创建K个簇，所以如果你选择一个好K（不能太低，所以它会搜索某些东西，但不能太高，所以它不会太分辨），它可以工作。

K-均值粗暴工作方式：

init_center(K) # randomly set K vector that will be the center of your cluster 

while not converge(): # This one is tricky as you can find a lot of way to check for the convergence, the easiest is to check if your center has moved since the last itteration 

    associate_vector() # Here you associate all the vectors to the closest center 

    re_calculate_center() # And now you put the center at the... well center of their point, you can do that just by doing the mean of all the vector of the cluster.

这GIF，可能比我更清楚：

和第（其中这个GIF是）真的比我更清楚，甚至如果他在这里谈论java： https://picoledelimao.github.io/blog/2016/03/12/multithreaded-k-means-in-java/

来源

2017-07-27 14:25:14

文档聚类和可视化

回答

相关问题