2014-02-26 11 views
3

我已经写这些功能用于基于序列聚类数据:确定理想的数字 - 基于聚类

library(TraMineR) 
library(cluster) 

clustering <- function(data){ 
    data <- seqdef(data, left = "DEL", gaps = "DEL", right = "DEL") 
    couts <- seqsubm(data, method = "CONSTANT") 
    data.om <- seqdist(data, method = "OM", indel = 3, sm = couts) 
    clusterward <- agnes(data.om, diss = TRUE, method = "ward") 
    (clusterward) 
} 

rc <- clustering(rubinius_sequences) 

cluster_cut <- function(data, clusterward, n_clusters, name_clusters){ 
    data <- seqdef(data, left = "DEL", gaps = "DEL", right = "DEL") 
    cluster4 <- cutree(clusterward, k = n_clusters) 
    cluster4 <- factor(cluster4, labels = c("Type 1", "Type 2", "Type 3", "Type 4")) 
    (data[cluster4==name_clusters,]) 
} 

rc1 <- cluster_cut(project_sequences, rc, 4, "Type 1") 

然而,在这里的簇的数目是任意分配。是否有某种方式可以表明,某些数量的聚类所捕获的方差量(或某些类似度量)开始在某个数量的聚类中达到递减收益点?我在想象类似于scree plot in factor analysis

回答

2
library(WeightedCluster) 
(agnesRange <- wcKMedRange(rubinius.dist, 2:10)) 
plot(agnesRange, stat = c("ASW", "HG", "PBC"), lwd = 5) 

这将给出多个索引以找到理想数量的簇以及图。有关指数的更多信息可以在此找到(在群集质量下): http://mephisto.unige.ch/weightedcluster/