gpt4 book ai didi

r - 自动计算数据框的汇总统计数据并创建新表

转载 作者:行者123 更新时间:2023-12-04 10:36:34 24 4
gpt4 key购买 nike

我有以下数据框:

col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov",
"chi","avi","chi","chi","bov","bov","fox","avi","bov","chi")
col2 <- c("low","med","high","high","low","low","med","med","med","high",
"low","low","high","high","med","med","low","low","med")
col3 <- c(0,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0)

test_data <- cbind(col1, col2, col3)
test_data <- as.data.frame(test_data)

我想最终得到这样的表格(值是随机的):
Species  Pop.density  %Resistance  CI_low  CI_high   Total samples
avi low 2.0 1.2 2.2 30
avi med 0 0 0.5 20
avi high 3.5 2.9 4.2 10
chi low 0.5 0.3 0.7 20
chi med 2.0 1.9 2.1 150
chi high 6.5 6.2 6.6 175

% 抗性列基于上面的 col3,其中 1 = 抗性,0 = 非抗性。我尝试了以下方法:
library(dplyr)
test_data<-test_data %>%
count(col1,col2,col3) %>%
group_by(col1, col2) %>%
mutate(perc_res = prop.table(n)*100)

我试过这个,它似乎几乎可以解决问题,因为我得到了 col3 中总 1 和 0 的百分比,对于 col1 和 2 中的每个值,但是总样本是错误的,因为我正在计算所有三列,当正确的计数仅适用于 col1 和 2。

对于置信区间,我将使用以下内容:
binom.test(resistant samples,total samples)$conf.int*100

但是我不确定如何与其他人一起实现它。
有没有简单快捷的方法来做到这一点?

最佳答案

我会做...

library(data.table)
setDT(DT)

DT[, {
bt <- binom.test(sum(resists), .N)$conf.int*100
.(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)
}, keyby=.(species, popdens)]

species popdens res_rate res_lo res_hi n
1: avi low 0.00000 0.000000 70.75982 3
2: avi med 0.00000 0.000000 97.50000 1
3: bov low 100.00000 15.811388 100.00000 2
4: bov med 50.00000 1.257912 98.74209 2
5: bov high 100.00000 15.811388 100.00000 2
6: chi low 0.00000 0.000000 97.50000 1
7: chi med 50.00000 1.257912 98.74209 2
8: chi high 66.66667 9.429932 99.15962 3
9: fox low 0.00000 0.000000 97.50000 1
10: fox med 50.00000 1.257912 98.74209 2

包括所有级别(物种和人口密度的组合)......
DT[CJ(species = species, popdens = popdens, unique = TRUE), on=.(species, popdens), {
bt <-
if (.N > 0L) binom.test(sum(resists), .N)$conf.int*100
else NA_real_
.(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)
}, by=.EACHI]

species popdens res_rate res_lo res_hi n
1: avi low 0.00000 0.000000 70.75982 3
2: avi med 0.00000 0.000000 97.50000 1
3: avi high NA NA NA 0
4: bov low 100.00000 15.811388 100.00000 2
5: bov med 50.00000 1.257912 98.74209 2
6: bov high 100.00000 15.811388 100.00000 2
7: chi low 0.00000 0.000000 97.50000 1
8: chi med 50.00000 1.257912 98.74209 2
9: chi high 66.66667 9.429932 99.15962 3
10: fox low 0.00000 0.000000 97.50000 1
11: fox med 50.00000 1.257912 98.74209 2
12: fox high NA NA NA 0

这个怎么运作

语法为 DT[i, j, by=]在哪里 ...
  • i确定行的子集,有时使用辅助参数,on=roll= .
  • by=确定子集表内的组,切换到 keyby=如果排序。
  • j是作用于每个组的代码。
  • j应该评估为一个列表,带有 .()list() 的快捷方式.见 ?data.table详情。

    使用的数据

    (重命名列,重新格式化二进制变量回 0/1 或 false/true,按正确顺序设置人口密度水平):
    DT = data.frame(
    species = col1,
    popdens = factor(col2, levels=c("low", "med", "high")),
    resists = col3
    )

    关于r - 自动计算数据框的汇总统计数据并创建新表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46242127/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com