r - R : prcomp and confidence ellipses 中的 PCA-6ren

r - R : prcomp and confidence ellipses 中的 PCA

转载作者：行者123 更新时间：2023-12-04 20:47:57

我最近在 R 中使用 prcomp() 函数运行了 PCA，现在我需要(客观地)确定来自我的两个不同组的哪些样本是异常值，应该从进一步分析中删除。

我以前看过 PCA 图，其中置信度/方差椭圆(不确定术语)放在样本周围，将被视为异常值的椭圆排除在外(例如，假设距聚类质心超过 3 个标准偏差)。我将如何在 R 中实现这样的目标？

注意:我查看了 "car" package ，但仍然不清楚 data.ellipse 将如何用于 PC1 vs PC2 投影图，例如。任何帮助/相关资源表示赞赏!

谢谢!

编辑:我正在使用的 R 对象以及我想用于异常值标记的图之一:

countsTable <- read.table('sample.txt', header=T)
transpose.counts.table <- t(countsTable)
input.for.pca <- transpose.counts.table[, colSums(abs(transpose.counts.table)) != 0]
my.prc <- prcomp(input.for.pca, center=T, scale=T)

pdf("PCA_Results_PC1_PC2_prcomp_counts.pdf")
plot(my.prc$x[,1], my.prc$x[,2], type='p', cex=0.0, pch=20, main="PCA: Samples' projection on PC1 and PC2 (raw counts)", xlab="PC1", ylab="PC2")
text(my.prc$x[,1], my.prc$x[,2], labels=rownames(my.prc$x), cex=1.2)
dev.off()

更新 input.for.pca 对象，其中包含一个分类“类型”列:

> dput(input.for.pca)
structure(list(A1BG = c(190L, 125L, 95L, 115L, 483L, 94L, 87L, 
211L, 153L, 135L, 116L, 110L, 75L, 159L, 148L, 159L, 177L, 103L, 
103L, 88L, 112L, 87L, 272L, 100L, 313L, 169L, 130L, 164L, 114L, 
154L, 168L, 197L, 125L, 95L, 118L, 154L, 197L, 203L, 184L, 86L, 
142L, 111L, 140L, 63L), A1BG.AS1 = c(77L, 94L, 53L, 52L, 56L, 
67L, 55L, 112L, 95L, 51L, 28L, 50L, 35L, 87L, 44L, 93L, 44L, 
16L, 21L, 24L, 42L, 43L, 159L, 59L, 125L, 108L, 50L, 68L, 55L, 
81L, 81L, 39L, 64L, 67L, 66L, 57L, 114L, 82L, 51L, 21L, 126L, 
24L, 53L, 3L), A1CF = c(1L, 3L, 3L, 2L, 0L, 0L, 1L, 5L, 15L, 
0L, 1L, 1L, 2L, 1L, 0L, 0L, 3L, 0L, 2L, 1L, 0L, 1L, 2L, 0L, 0L, 
1L, 0L, 3L, 2L, 0L, 0L, 6L, 1L, 0L, 0L, 0L, 5L, 1L, 4L, 0L, 2L, 
2L, 2L, 0L), A2LD1 = c(94L, 51L, 52L, 57L, 64L, 40L, 48L, 61L, 
83L, 53L, 49L, 31L, 40L, 66L, 50L, 43L, 54L, 14L, 73L, 58L, 50L, 
36L, 132L, 88L, 96L, 73L, 47L, 73L, 100L, 49L, 40L, 54L, 34L, 
34L, 45L, 56L, 77L, 66L, 90L, 62L, 67L, 47L, 80L, 9L), A2M = c(4407L, 
4755L, 1739L, 2049L, 3219L, 2598L, 2531L, 3894L, 2067L, 2703L, 
3776L, 774L, 3129L, 2924L, 1997L, 5803L, 3147L, 5472L, 9608L, 
3315L, 6164L, 1250L, 5911L, 4688L, 2775L, 4561L, 7165L, 3605L, 
8228L, 4835L, 7124L, 4689L, 5306L, 3643L, 3190L, 3290L, 4932L, 
1990L, 9610L, 7476L, 4533L, 4035L, 3275L, 1326L), A2ML1 = c(195L, 
207L, 63L, 291L, 24L, 126L, 168L, 251L, 39L, 145L, 213L, 126L, 
179L, 169L, 141L, 272L, 185L, 115L, 588L, 156L, 111L, 45L, 301L, 
182L, 155L, 146L, 91L, 160L, 155L, 73L, 44L, 103L, 182L, 71L, 
164L, 405L, 245L, 165L, 162L, 317L, 188L, 153L, 228L, 11L), A4GALT = c(191L, 
86L, 64L, 200L, 39L, 118L, 106L, 64L, 11L, 40L, 144L, 53L, 134L, 
101L, 138L, 138L, 214L, 138L, 406L, 145L, 497L, 72L, 473L, 86L, 
41L, 213L, 172L, 77L, 657L, 73L, 123L, 126L, 106L, 44L, 125L, 
106L, 56L, 114L, 756L, 328L, 151L, 210L, 213L, 42L), A4GNT = c(3L, 
3L, 0L, 5L, 7L, 1L, 0L, 2L, 4L, 3L, 0L, 0L, 0L, 2L, 2L, 2L, 3L, 
0L, 1L, 0L, 1L, 2L, 2L, 5L, 4L, 4L, 1L, 1L, 2L, 1L, 1L, 0L, 2L, 
2L, 3L, 3L, 5L, 2L, 3L, 2L, 0L, 0L, 1L, 0L), AA06 = c(0L, 0L, 
0L, 2L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L), AAA1 = c(1L, 5L, 5L, 
4L, 0L, 1L, 5L, 0L, 0L, 1L, 0L, 2L, 1L, 12L, 1L, 5L, 6L, 0L, 
3L, 2L, 0L, 0L, 14L, 2L, 3L, 0L, 3L, 4L, 0L, 7L, 3L, 4L, 0L, 
1L, 4L, 1L, 8L, 8L, 1L, 2L, 4L, 2L, 1L, 1L), AAAS = c(829L, 1042L, 
844L, 805L, 1700L, 953L, 809L, 1052L, 1266L, 781L, 618L, 929L, 
699L, 992L, 1001L, 1423L, 845L, 1054L, 808L, 711L, 938L, 756L, 
1384L, 944L, 1689L, 1052L, 703L, 890L, 1293L, 727L, 804L, 1227L, 
668L, 794L, 835L, 877L, 1514L, 1287L, 1435L, 941L, 1115L, 868L, 
923L, 288L), AACS = c(2350L, 1953L, 1884L, 1702L, 421L, 1530L, 
1435L, 3619L, 815L, 1320L, 859L, 1708L, 1096L, 2124L, 1029L, 
1930L, 1241L, 724L, 867L, 893L, 1797L, 447L, 4854L, 1670L, 2675L, 
2471L, 1874L, 1620L, 2515L, 3156L, 2079L, 1345L, 1684L, 1615L, 
1650L, 1386L, 3470L, 1958L, 2278L, 1076L, 3459L, 1115L, 1369L, 
121L), AACSP1 = c(19L, 6L, 11L, 13L, 1L, 11L, 13L, 27L, 5L, 12L, 
4L, 7L, 4L, 6L, 5L, 18L, 17L, 0L, 7L, 6L, 4L, 1L, 19L, 16L, 30L, 
11L, 12L, 20L, 11L, 10L, 11L, 3L, 4L, 10L, 16L, 4L, 8L, 7L, 10L, 
5L, 18L, 6L, 5L, 0L), AADAC = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 2L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 
0L, 1L, 0L, 0L), AADACL2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L), AADACL3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L), AADACL4 = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 3L, 0L, 
0L, 0L, 1L, 0L), AADAT = c(387L, 416L, 297L, 392L, 682L, 422L, 
287L, 704L, 50L, 373L, 306L, 234L, 225L, 340L, 220L, 443L, 387L, 
324L, 304L, 261L, 259L, 181L, 801L, 428L, 925L, 498L, 270L, 524L, 
654L, 472L, 334L, 395L, 414L, 440L, 318L, 306L, 645L, 418L, 350L, 
277L, 468L, 302L, 298L, 48L), AAGAB = c(1235L, 1231L, 1026L, 
981L, 477L, 877L, 808L, 2217L, 764L, 914L, 670L, 974L, 538L, 
1362L, 492L, 1078L, 764L, 297L, 582L, 615L, 923L, 307L, 3055L, 
1195L, 1673L, 1673L, 1070L, 1052L, 1761L, 2198L, 1221L, 813L, 
1050L, 997L, 865L, 930L, 2065L, 1190L, 1243L, 578L, 1931L, 664L, 
874L, 75L), AAK1 = c(6457L, 6538L, 4706L, 4917L, 1252L, 4055L, 
4063L, 11627L, 9127L, 3604L, 2439L, 4221L, 3968L, 5065L, 2450L, 
5690L, 3065L, 1082L, 2756L, 2886L, 3763L, 1360L, 15237L, 4771L, 
7881L, 8349L, 5177L, 4888L, 6532L, 7856L, 5373L, 3487L, 4885L, 
4461L, 3893L, 4152L, 9055L, 4656L, 4501L, 2598L, 8079L, 3187L, 
3655L, 337L), AAMP = c(2282L, 2585L, 2113L, 2197L, 2226L, 1776L, 
2097L, 3614L, 2494L, 2215L, 1707L, 2109L, 1740L, 2620L, 1703L, 
2357L, 1965L, 1697L, 1724L, 1623L, 2299L, 1109L, 5555L, 2550L, 
4239L, 3149L, 2127L, 2487L, 3966L, 2817L, 2043L, 1967L, 2092L, 
2031L, 2123L, 2661L, 4203L, 2884L, 3224L, 1678L, 3876L, 1963L, 
2362L, 473L), AANAT = c(33L, 51L, 14L, 26L, 23L, 12L, 36L, 14L, 
27L, 24L, 30L, 17L, 11L, 45L, 31L, 28L, 23L, 67L, 77L, 26L, 44L, 
17L, 86L, 70L, 16L, 39L, 10L, 27L, 20L, 22L, 23L, 20L, 10L, 12L, 
18L, 28L, 41L, 40L, 85L, 40L, 48L, 30L, 46L, 8L), AARS = c(6383L, 
9377L, 6772L, 8134L, 5605L, 4734L, 5902L, 13757L, 6832L, 6566L, 
4009L, 5377L, 7209L, 7749L, 4105L, 6969L, 5120L, 5484L, 5486L, 
4935L, 6604L, 3151L, 24172L, 7615L, 12786L, 12676L, 7009L, 8208L, 
11328L, 11550L, 7054L, 4789L, 6547L, 6686L, 6109L, 6456L, 14576L, 
8317L, 8057L, 4626L, 13162L, 5801L, 6090L, 1498L), AARS2 = c(1032L, 
858L, 687L, 735L, 527L, 655L, 641L, 1480L, 1713L, 753L, 561L, 
541L, 459L, 819L, 462L, 867L, 605L, 404L, 571L, 497L, 637L, 343L, 
1761L, 1082L, 1379L, 815L, 841L, 844L, 1150L, 1121L, 973L, 665L, 
696L, 672L, 824L, 511L, 1313L, 861L, 998L, 626L, 1258L, 555L, 
623L, 115L), AARSD1 = c(918L, 1218L, 793L, 877L, 573L, 867L, 
916L, 2030L, 1198L, 1015L, 715L, 909L, 437L, 1245L, 566L, 1083L, 
985L, 325L, 584L, 621L, 871L, 353L, 2033L, 887L, 1412L, 1205L, 
1143L, 1037L, 1592L, 1413L, 1183L, 1216L, 1121L, 888L, 1021L, 
846L, 2189L, 1182L, 1412L, 656L, 1708L, 797L, 988L, 80L), type = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("control", 
"diseased"), class = "factor")), .Names = c("A1BG", "A1BG.AS1", 
"A1CF", "A2LD1", "A2M", "A2ML1", "A4GALT", "A4GNT", "AA06", "AAA1", 
"AAAS", "AACS", "AACSP1", "AADAC", "AADACL2", "AADACL3", "AADACL4", 
"AADAT", "AAGAB", "AAK1", "AAMP", "AANAT", "AARS", "AARS2", "AARSD1", 
"type"), row.names = c("C_2", "C_4", "C_6", "C_8", "C_9", "C_10", 
"C_14", "C_15", "C_18", "C_21", "C_29", "P_3", "P_6", "P_13", 
"P_15", "P_18", "P_19", "P_21", "P_22", "P_29", "P_31", "C_3", 
"C_5", "C_11", "C_12", "C_13", "C_16", "C_17", "C_19", "C_20", 
"C_22", "C_23", "C_24", "C_25", "C_26", "P_14", "P_16", "P_20", 
"P_23", "P_26", "P_27", "P_28", "P_30", "P_33"), class = "data.frame")

感谢 DWin 的输入，我查看了 FactoMineR 包，它能够绘制我所询问的置信椭圆的类型。这是使用的代码:

res.pca <- PCA(input.for.pca, scale.unit=T, ncp=5, quali.sup = 26, graph = F)
concat = cbind.data.frame(input.for.pca[,26], res.pca$ind$coord)
ellipse.coord = coord.ellipse(concat, level.conf = 0.99999, bary=T)
plot.PCA(res.pca, ellipse = ellipse.coord, axes=c(1, 2), choix="ind", habillage=26)

您可能会注意到 coord.ellipse 函数的 level.conf 选项。通过将此选项从默认值 0.95 更改，我能够增加椭圆的大小。

我发现 this link 对理解如何使用 FactoMineR 很有用。

最佳答案

在没有可用数据的情况下，我建议查看 FactoMineR 包，该包提供了一些带有可选椭圆的 PCA 图: plot.PCA "Draw the Principal Component Analysis (PCA) graphs:. 将 'ellipse' 参数设置为非 NULL 值应该:“在个体周围绘制椭圆，并使用 coord.ellipse 的结果”。

使用 FactoMiner::PCA 处理您的数据我能够得到与您的 prcomp 的结果相同类型的图称呼。

require(FactoMineR)
PCAres <-PCA(input.for.pca)  # draws two plots as a side-effect

我无法在绘图例程中使用其内置参数获取数据省略号。在其帮助页面上检查该例程的示例，我认为这是因为它需要一个因子类别标识符来标记组中的成员资格，以便组件值可以是标签和围绕组质心绘制的椭圆。

关于r - R : prcomp and confidence ellipses 中的 PCA，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14269435/

文章推荐： qt - heightForWidth标签

文章推荐： php - wkhtmltopdf 中的 session 问题

文章推荐： sql - 根据组和其他条件的 ID 的最小值和最大值

文章推荐： asp.net-mvc - HtmlHelper 与 Partial 的性能

mysql - 将用户导入 Zizaco/Confide
我有一个 mysql 数据库，用于存储用户的登录凭据。该数据库方案与 Confide 使用的数据库方案不同。我的数据库中的一些字段与 Confide 使用的字段类似，例如用户名、电子邮件和密码，但 C
ios - 像 Confide 一样禁用屏幕截图
在我的应用程序中，我需要禁用屏幕截图，我知道可以在屏幕截图完成后进行处理。也许有人知道如何防止像“Confide”这样的应用程序截图呢？或者也许有人有 ScreenShieldKit SDK？这是阅读
Android SpeechRecognizer "confidence"值令人困惑
我正在通过 Intent 使用 SpeechRecognizer: Intent i = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); i
java - 显示类似 Confide android 应用程序的文本
我需要创建像 Confide 应用程序一样的文本显示。我尝试的是使用 FlowLayout 但随后我无法获取该行，以便我可以隐藏显示行。似乎有多种选择，但有点困惑，无法思考 wwat 到底要做什么..
php - 为 Zizaco/confide 添加额外字段会导致错误
您好，我正在使用 Laravel(这个框架的新手)和 Zizaco/Confide 包。我已经根据自己的需要和偏好进行了很多编辑，但我似乎无法解决添加新输入字段以创建新用户作为管理员的问题 see f
javascript - 在 HapiJS Confidence 配置对象中设置过滤对象的默认值
我正在使用 Confidence我的项目中用于配置管理的库(在 HapiJS 套件中)(这个配置文件专门用于 WinstonJS 传输)，我想知道是否可以为过滤对象中的单个项目设置默认值。我不确定我解
R编程: predict(), "prediction"和 "confidence"吗？
无论如何，学习R ..: 在简单的x和y回归中，我输入: predict(data1.lm, interval="prediction") 和 predict(data1.lm, interval="
java - 苔丝4J : How to get a Character's confidence value?
使用以下代码，我想获取字符的置信度值，但由于某种原因，单词的每个字符都会返回相同的值。但是，使用类似的代码行 (GetUTF8Text)，字符本身可以正确返回。我的代码有什么问题吗，或者这可能是 Te
.net - 微软语音识别 : Alternate results with confidence score?
我是使用 Microsoft.Speech 识别器(使用 Microsoft Speech Platform SDK 版本 11)的新手，我试图让它从一个简单的语法输出 n 最佳识别匹配，以及每个的置
Python统计模块: How to extract confidence/prediction intervals from GPy?
在浏览完所有在线文档和示例后，我无法找到一种方法来从 GPy 中提取有关置信度或预测区间的信息。模型。我生成这样的虚拟数据， ## Generating data for regression #
python - 拉萨 NLU : Confidence Score Computation
我试图了解 rasa nlu(ver-0.12.3) 输出的置信度分数实际上是什么以及它们是如何计算的。我一直致力于使用 tensorflow 嵌入进行意图分类任务。一旦我的模型经过训练并且我解析了
python - predict_proba 或 decision_function 作为估计器 "confidence"
我使用 LogisticRegression 作为模型来训练 scikit-learn 中的估算器。我使用的特征(大部分)是分类的；标签也是如此。因此，我分别使用 DictVectorizer 和 L
r - R : prcomp and confidence ellipses 中的 PCA
我最近在 R 中使用 prcomp() 函数运行了 PCA，现在我需要(客观地)确定来自我的两个不同组的哪些样本是异常值，应该从进一步分析中删除。我以前看过 PCA 图，其中置信度/方差椭圆(不确定
c# - 谷歌语音转文本 API : Enable Word Confidence Not Found
我无法将单词级别的置信度添加到我的替代结果中，有人可以帮忙吗？我尝试阅读以下页面: https://cloud.google.com/speech-to-text/docs/word-confide
python - 统计模型 ARIMA : how to get confidence/prediction interval?
如何生成“较低”和“较高”的预测，而不仅仅是“yhat”？ import statsmodels from statsmodels.tsa.arima.model import ARIMA asser
r - 多元线性回归 : Plot a straight line with confidence intervals
这是我的问题: 1) 我进行了多元线性回归:假设如下: lm(attitude~quality+price+location+Income) 我主要关心的是态度和素质的关系，其他变量都是控制变量。 2
confidence-interval - 根据自举 95% 置信区间之间的差异计算 p 值
我使用 2.5 和 97.5 百分位数生成了 95% 置信区间，对来自不同三组的一些数据进行了引导模型拟合。我知道，如果 95% 的置信区间不重叠，那么值之间至少存在 p<0.05 的显着差异。我想
R 中的随机森林 : Is there a possibility of calculating casewise confidence intervals?
R 包randomForest报告森林中每棵树的均方误差。但是，我需要对数据中的每个案例进行置信度测量。由于randomForest通过对单棵树的预测进行平均来计算逐个案例的预测，我想它也应该可以计算
java - Android语音识别: How to get results with highest confidence score?
我正在尝试开发一个具有语音识别功能的Android应用程序。请看下面的代码。 @Override public void onPartialResults(Bundle arg0) {
python - Scikit 学习 : Cross validation and Confidence Intervals
我正在尝试使用 scikit-learn 中的 DecisionTreeClassifier 计算我的分类模型的置信区间。阅读有关交叉验证和置信区间的 scikit-learn 文档 (https:

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - R : prcomp and confidence ellipses 中的 PCA