gpt4 book ai didi

r - 如何按组在 R 中的数据帧中执行 KS.test

转载 作者:行者123 更新时间:2023-12-04 01:43:01 26 4
gpt4 key购买 nike

我想在 R 中为以下示例数据帧“数据”执行两个样本 Kolmogorov-Smirnov (KS) 测试:

    Protein1    Protein2    Protein3    Protein4    Protein5    Protein6    Protein7    Protein8    Group
Sample1 0.56 1.46 0.64 2.53 1.96 305.29 428.41 113.22 Control
Sample2 0.75 2.29 0.38 4.31 1.72 307.95 492.58 82.75 Control
Sample3 2.05 1.73 2.42 14.75 2.92 523.92 426.94 131.51 Control
Sample4 1.71 5.37 0.68 6.39 2.02 343.05 435.16 123.12 Control
Sample5 13.31 0.94 1.21 3.83 2.83 313.71 327.84 66.8 Control
Sample6 0.36 1.81 0.42 2.25 1.48 335.23 352.55 93.81 Control
Sample7 0.28 3.26 0.49 2.62 1.96 251.49 468.19 80.27 Control
Sample8 1.62 17.01 0.49 2.87 1.7 254.79 402.9 86.8 Control
Sample9 0.59 2.64 0.18 2.93 1.23 388.87 384.53 109.52 Control
Sample10 1.67 3.75 0.43 3.89 1.83 306.28 440.86 97.55 Control
Sample11 15.53 12.02 0.57 1.81 2.31 328.56 352.98 118.18 Control
Sample12 0.94 7.06 35.77 4.98 2.44 389.14 376.18 119.75 Control
Sample13 2.07 1.73 0.38 3.89 2.56 233.81 377.21 72.1 Control
Sample14 3.2 1.38 0.5 4.05 2.51 406.57 538.57 209.16 Patient
Sample15 1.33 2.17 0.46 3.31 1.72 278.04 276.37 79.4 Patient
Sample16 1.48 2.9 0.84 9.27 1.94 332.76 413.66 99.09 Patient
Sample17 2.02 1.96 0.25 4.16 1.96 358.73 383.63 107.46 Patient
Sample18 2.94 2 0.27 3.99 2.55 354.78 493.02 145.36 Patient
Sample19 1.01 8.1 0.35 3.65 1.62 335.18 264.74 145.15 Patient
Sample20 6.95 15.48 2.94 3.64 2.3 307.23 484.38 119.61 Patient
Sample21 0.52 1.38 0.56 3.08 1.86 244.3 304.74 76.87 Patient
Sample22 0.35 2.17 0.38 4.51 2.09 304.68 369.98 151.76 Patient
Sample23 2.26 2.9 0.3 4.44 2.43 302.51 367.51 150.69 Patient
Sample24 3.19 1.96 0.81 2.94 2.15 309.59 362.18 133.49 Patient
Sample25 1.12 2 0.71 3.77 2.42 334.36 358.9 131.35 Patient
Sample26 5.28 8.1 0.81 22.84 2.35 422.18 369.71 76.35 Patient

要在 2 个单独的列之间执行 KS 测试,代码如下:
> ks.test(data$Protein1, data$Protein2, data=data)

Two-sample Kolmogorov-Smirnov test

data: data$Protein1 and data$Protein2
D = 0.42308, p-value = 0.01905
alternative hypothesis: two-sided

Warning message:
In ks.test(data$Protein1, data$Protein2, data = data) :
cannot compute exact p-value with ties

但是,我想按列和按组执行此操作。例如,对 t.test 或 wilcox.test 执行此操作很容易,因为您可以将其编码为
t.test(y1,y2) 或 t.test(y~x) # 其中 y 是数字,x 是二进制因子
但是当涉及到二元因子时,ks.test 没有代码。
有人能帮忙吗?

最终,我想对所有蛋白质的整个数据框执行此操作,因为我可以成功地进行 t 检验,但我想对 ks.test 执行以下操作:
t_tests <- lapply(
data[, -1], # apply the function to every variable *other than* the first one (group)
function(x) { t.test(x ~ HealthGroups, data = data) }
)

任何帮助表示赞赏。提前谢谢你。

最佳答案

这是一个非常简单的方法。这使用了一个循环,这在 R 圈中通常是不受欢迎的。但是,它非常简单且不言自明,这对新用户来说可能是一个加分项,并且在这种情况下循环太慢也没有问题。 (请注意,如果您愿意,您可以使用 lapply(),但这仍然是一个循环,只是在外面看起来不同。)

只需使用相同的变量创建两个新的子集数据框。然后循环调用 ks.test 的数据帧.输出不是很人性化——它只会说 j — 所以我给 ?writeLines 添加了一个电话打印正在测试的变量的名称。

# I am assuming the original data frame is called d
dc <- d[d$Group=="Control",]
dp <- d[d$Group=="Patient",]
for(j in 1:8){
writeLines(names(dc)[j])
print(ks.test(dc[,j], dp[,j]))
}
# Protein1
#
# Two-sample Kolmogorov-Smirnov test
#
# data: dc[, j] and dp[, j]
# D = 0.30769, p-value = 0.5882
# alternative hypothesis: two-sided
#
# Protein2
#
# Two-sample Kolmogorov-Smirnov test
#
# data: dc[, j] and dp[, j]
# D = 0.23077, p-value = 0.8793
# alternative hypothesis: two-sided
#
# Protein3
#
# Two-sample Kolmogorov-Smirnov test
#
# data: dc[, j] and dp[, j]
# D = 0.23077, p-value = 0.8793
# alternative hypothesis: two-sided
#
# Protein4
#
# Two-sample Kolmogorov-Smirnov test
#
# data: dc[, j] and dp[, j]
# D = 0.46154, p-value = 0.1254
# alternative hypothesis: two-sided
#
# Protein5
#
# Two-sample Kolmogorov-Smirnov test
#
# data: dc[, j] and dp[, j]
# D = 0.23077, p-value = 0.8793
# alternative hypothesis: two-sided
#
# Protein6
#
# Two-sample Kolmogorov-Smirnov test
#
# data: dc[, j] and dp[, j]
# D = 0.15385, p-value = 0.9992
# alternative hypothesis: two-sided
#
# Protein7
#
# Two-sample Kolmogorov-Smirnov test
#
# data: dc[, j] and dp[, j]
# D = 0.38462, p-value = 0.2999
# alternative hypothesis: two-sided
#
# Protein8
#
# Two-sample Kolmogorov-Smirnov test
#
# data: dc[, j] and dp[, j]
# D = 0.46154, p-value = 0.1265
# alternative hypothesis: two-sided
#
# Warning messages:
# 1: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties
# 2: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties
# 3: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties
# 4: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties

这将创建一个数据框来容纳结果:
dk <- data.frame(Protein=character(8), D=numeric(8), p=numeric(8), stringsAsFactors=F)
for(j in 1:8){
k <- ks.test(dc[,j], dp[,j])
dk$Protein[j] <- names(dc)[j]
dk$D[j] <- k$statistic
dk$p[j] <- k$p.value
}
dk
# Protein D p
# 1 Protein1 0.3076923 0.5881961
# 2 Protein2 0.2307692 0.8793244
# 3 Protein3 0.2307692 0.8793244
# 4 Protein4 0.4615385 0.1253895
# 5 Protein5 0.2307692 0.8793244
# 6 Protein6 0.1538462 0.9992124
# 7 Protein7 0.3846154 0.2999202
# 8 Protein8 0.4615385 0.1264877

关于r - 如何按组在 R 中的数据帧中执行 KS.test,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56600911/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com