gpt4 book ai didi

r - 麦克鲁斯特 : Order of input parameters affecting clustering results

转载 作者:行者123 更新时间:2023-12-02 13:14:44 27 4
gpt4 key购买 nike

我正在使用mclust使用不同数量的输入(下面脚本中的 X、Y、Z、R 和 S)查看数据集中的各种聚类:

例如

elements<-cbind(X,Y,Z,R,S)
dataclust<-Mclust(elements)

我刚刚发现输入参数的顺序很重要并且会影响结果;换句话说elements <- cbind(X,Y,Z,R,S)给出了与 elements-<cbind(Y,Z,X,R,S) 不同的簇。我的理解是所有输入参数在聚类分析中具有相同的权重和重要性。是我错了还是bug?

我在 R 2.15.3 和其他 2 个 R 版本中看到了这一点。

对上述内容的任何评论或解释都值得赞赏。

最佳答案

不幸的是,我无法评论或编辑我之前的评论,所以我发布了一个答案。 @m-dz 让我走上了一条我认为已经揭示了可能答案的道路。具体来说:

> library(mclust)
__ ___________ __ _____________
/ |/ / ____/ / / / / / ___/_ __/
/ /|_/ / / / / / / / /\__ \ / /
/ / / / /___/ /___/ /_/ /___/ // /
/_/ /_/\____/_____/\____//____//_/ version 5.2.2
Type 'citation("mclust")' for citing this R package in publications.

> testDataA <- read.table("http://fimi.ua.ac.be/data/chess.dat")

> summary(Mclust(subset(testDataA, select = c(V1, V3, V5, V7, V9, V11))))
----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm
----------------------------------------------------

Mclust EII (spherical, equal volume) model with 9 components:

log.likelihood n df BIC ICL
-3597.466 3196 63 -7703.32 -7735.137

Clustering table:
1 2 3 4 5 6 7 8 9
774 150 752 486 227 224 238 178 167

> summary(Mclust(subset(testDataA, select = c(V11, V9, V1, V3, V5, V7))))
----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm
----------------------------------------------------

Mclust EII (spherical, equal volume) model with 9 components:

log.likelihood n df BIC ICL
-3597.466 3196 63 -7703.32 -7735.137

Clustering table:
1 2 3 4 5 6 7 8 9
774 150 752 486 227 224 238 178 167

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.5

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] mclust_5.2.2

loaded via a namespace (and not attached):
[1] tools_3.3.2

如您所见,这会产生两个与 @m-dz 匹配的解决方案!然而,我之前所做的是加载 psych 包。我现在看到这是从 mclust 屏蔽 sim。我猜这会导致错误的解决方案:

> library(psych)

Attaching package: ‘psych’

The following object is masked from ‘package:mclust’:

sim

> testDataB <- read.file(f = "http://fimi.ua.ac.be/data/chess.dat")
Data from the .data file http://fimi.ua.ac.be/data/chess.dat has been loaded.

> summary(Mclust(subset(testDataB, select = c(X1, X3, X5, X7, X9, X11))))
----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm
----------------------------------------------------

Mclust EEV (ellipsoidal, equal volume and shape) model with 2 components:

log.likelihood n df BIC ICL
3547.068 3195 49 6698.738 6692.126

Clustering table:
1 2
2759 436

> summary(Mclust(subset(testDataB, select = c(X11, X9, X1, X3, X5, X7))))
----------------------------------------------------
Gaussian finite mixture model fitted by EM algorithm
----------------------------------------------------

Mclust EEV (ellipsoidal, equal volume and shape) model with 6 components:

log.likelihood n df BIC ICL
18473.94 3195 137 35842.37 35834.51

Clustering table:
1 2 3 4 5 6
431 932 210 881 524 217

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.5

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] psych_1.6.9 mclust_5.2.2

loaded via a namespace (and not attached):
[1] parallel_3.3.2 tools_3.3.2 foreign_0.8-67 mnormt_1.5-5

关于r - 麦克鲁斯特 : Order of input parameters affecting clustering results,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20392452/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com