gpt4 book ai didi

r - 确定基于序列(距离)的聚类的理想聚类数

转载 作者:行者123 更新时间:2023-12-03 22:53:28 29 4
gpt4 key购买 nike

我已经编写了这些用于聚类基于序列的数据的函数:

library(TraMineR)
library(cluster)

clustering <- function(data){
data <- seqdef(data, left = "DEL", gaps = "DEL", right = "DEL")
couts <- seqsubm(data, method = "CONSTANT")
data.om <- seqdist(data, method = "OM", indel = 3, sm = couts)
clusterward <- agnes(data.om, diss = TRUE, method = "ward")
(clusterward)
}

rc <- clustering(rubinius_sequences)

cluster_cut <- function(data, clusterward, n_clusters, name_clusters){
data <- seqdef(data, left = "DEL", gaps = "DEL", right = "DEL")
cluster4 <- cutree(clusterward, k = n_clusters)
cluster4 <- factor(cluster4, labels = c("Type 1", "Type 2", "Type 3", "Type 4"))
(data[cluster4==name_clusters,])
}

rc1 <- cluster_cut(project_sequences, rc, 4, "Type 1")

然而,这里集群的数量是任意分配的。有什么方法可以证明一定数量的集群捕获的方差量(或一些类似的度量)在一定数量的集群上开始达到 yield 递减点?我正在想象类似于 scree plot in factor analysis 的东西.

最佳答案

library(WeightedCluster)  
(agnesRange <- wcKMedRange(rubinius.dist, 2:10))
plot(agnesRange, stat = c("ASW", "HG", "PBC"), lwd = 5)

这将为找到理想的集群数量提供多个索引,以及图表。可以在此处找到有关索引的更多信息(在集群质量下):
http://mephisto.unige.ch/weightedcluster/

关于r - 确定基于序列(距离)的聚类的理想聚类数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22046436/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com