gpt4 book ai didi

r - 分组后在 dplyr 中使用 t.test 汇总

转载 作者:行者123 更新时间:2023-12-04 00:30:13 26 4
gpt4 key购买 nike

library(dplyr)
library(ggplot2)
library(magrittr)

diamonds %>%
group_by(cut) %>%
summarise(price_avg = t.test(
. %>% filter(color == "E") %$% price,
. %>% filter(color == "I") %$% price )$p.value)

我正在尝试获得 t.test 的结果以按组申请。在此示例中,查找相同切工时颜色的价格是否存在显着差异。我得到的结果是:
Error in summarise_impl(.data, dots) : 
Evaluation error: is.atomic(x) is not TRUE.

最佳答案

library(tidyverse)
library(magrittr)

diamonds %>%
group_by(cut) %>%
summarise(price_avg = t.test(price[color=="E"], price[color=="I"])$p.value)

# # A tibble: 5 x 2
# cut price_avg
# <ord> <dbl>
# 1 Fair 3.90e- 3
# 2 Good 1.46e-12
# 3 Very Good 2.44e-39
# 4 Premium 7.27e-52
# 5 Ideal 7.63e-62

您的解决方案存在问题 .不会得到你的数据集的子集(基于你的分组),而是整个数据集。这样做检查:
diamonds %>% 
group_by(cut) %>%
summarise(d = list(.))

# # A tibble: 5 x 2
# cut d
# <ord> <list>
# 1 Fair <tibble [53,940 x 10]>
# 2 Good <tibble [53,940 x 10]>
# 3 Very Good <tibble [53,940 x 10]>
# 4 Premium <tibble [53,940 x 10]>
# 5 Ideal <tibble [53,940 x 10]>

另一种解决方案是:
diamonds %>% 
nest(-cut) %>%
mutate(price_avg = map_dbl(data, ~t.test(
.x %>% filter(color == "E") %$% price,
.x %>% filter(color == "I") %$% price )$p.value))

# # A tibble: 5 x 3
# cut data price_avg
# <ord> <list> <dbl>
# 1 Ideal <tibble [21,551 x 9]> 7.63e-62
# 2 Premium <tibble [13,791 x 9]> 7.27e-52
# 3 Good <tibble [4,906 x 9]> 1.46e-12
# 4 Very Good <tibble [12,082 x 9]> 2.44e-39
# 5 Fair <tibble [1,610 x 9]> 3.90e- 3

这适用于 filter因为你可以传递给 filter每次都使用适当的数据子集(即列 data )。

关于r - 分组后在 dplyr 中使用 t.test 汇总,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52588675/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com