gpt4 book ai didi

r - dplyr 编程 : unquote-splicing causes overscope error with complete() and nesting()

转载 作者:行者123 更新时间:2023-12-04 17:48:17 24 4
gpt4 key购买 nike

所以我开始涉足 dplyr 编程的美妙世界。我正在尝试编写一个接受 data.frame、目标列和任意数量的分组列(对所有列使用裸名称)的函数。然后该函数将根据目标列对数据进行分箱,并计算每个分箱中的条目数。我想为原始 data.frame() 中存在的分组变量的每个组合保留一个单独的 bin 大小,因此我使用 complete() 和 nesting() 函数来执行此操作。这是我正在尝试做的事情以及我遇到的错误的示例:

library(dplyr)
library(tidyr)

#Prepare test data
set.seed(42)
test_data =
data.frame(Gene_ID = rep(paste0("Gene.", 1:10), times=4),
Comparison = rep(c("WT_vs_Mut1", "WT_vs_Mut2"), each=10, times=2),
Test_method = rep(c("T-test", "MannWhitney"), each=20),
P_value = runif(40))

#Perform operation manually
test_data %>%
#Start by binning the data according to q-value
mutate(Probability.bin = cut(P_value,
breaks = c(-Inf, seq(0.1, 1, by=0.1), Inf),
labels = c(seq(0.0, 1.0, by=0.1)),
right = FALSE)) %>%
#Now summarize the results by bin.
count(Comparison, Test_method, Probability.bin) %>%
#Fill in any missing bins with 0 counts
complete(nesting(Comparison, Test_method), Probability.bin,
fill=list(n = 0))

#Create function that accepts bare column names
bin_by_p_value <- function(df,
pvalue_col, #Bare name of p-value column
...) { #Bare names of grouping columns

#"Quote" column names so they are ready for use below
pvalue_col_name <- enquo(pvalue_col)
group_by_cols <- quos(...)

#Perform the operation
df %>%
#Start by binning the data according to q-value
mutate(Probability.bin = cut(UQ(pvalue_col_name),
breaks = c(-Inf, seq(0.1, 1, by=0.1), Inf),
labels = c(seq(0.0, 1.0, by=0.1)),
right = FALSE)) %>%
#Now summarize the results by bin.
count(UQS(group_by_cols), Probability.bin) %>%
#Fill in any missing bins with 0 counts
complete(nesting(UQS(group_by_cols)), Probability.bin,
# complete(nesting(UQS(group_by_cols)), Probability.bin,
fill=list(n = 0))
}

#Use function to perform operation
test_data %>%
bin_by_p_value(P_value, Comparison, Test_method)

当我手动执行操作时,一切正常。当我使用该函数时,它因以下错误而失败:

Error in overscope_eval_next(overscope, expr) : object 'Comparison' not found

我已将问题缩小到函数中的以下代码:

complete(nesting(UQS(group_by_cols)), Probability.bin...

如果我删除对 nesting() 的调用,代码将在没有错误的情况下执行。但是,我想保留仅使用原始数据中存在的分组变量组合的功能,然后获取所有可能的 bin 组合,以便我可以填充所有缺失的 bin。根据错误名称和失败的地方,我猜这是一个范围/环境问题,我真的应该为 nesting() 中的分组变量使用不同的环境,因为它包含在对 complete() 的调用中。但是,我对 dplyr 编程还很陌生,所以我不确定该怎么做。

我试图通过将分组列合并为一个列,然后使用该联合列作为 complete() 的输入来解决这个问题。这让我可以按照我想要的方式执行 complete() 操作,同时避免使用 nesting() 函数。但是,当我想分离回原始分组列时遇到了麻烦,因为我不知道如何将 quosures 列表转换为字符向量(separate() 的“into”参数需要)。以下是说明我在说什么的代码片段:

        #Fill in any missing bins with 0 counts
unite(Merged_grouping_cols, UQS(group_by_cols), sep="*") %>%
complete(Merged_grouping_cols, Probability.bin,
fill=list(n = 0)) %>%
separate(Merged_grouping_cols, into=c("What goes here?"), sep="\\*")

这是相关的版本信息:R 版本 3.4.2 (2017-09-28),tidyr_0.7.2,dplyr_0.7.4

如果有任何变通办法,我将不胜感激,但我想知道我正在做的事情以错误的方式摩擦了 complete() 和 nesting()。

最佳答案

  • pvalue_col 使用 curl {{}}
  • 将点 (...) 直接传递给 count
  • 嵌套中使用ensyms!!!
bin_by_p_value <- function(df,
pvalue_col, #Bare name of p-value column
...) { #Bare names of grouping columns

#Perform the operation
df %>%
#Start by binning the data according to q-value
mutate(Probability.bin = cut({{pvalue_col}},
breaks = c(-Inf, seq(0.1, 1, by=0.1), Inf),
labels = c(seq(0.0, 1.0, by=0.1)),
right = FALSE)) %>%
#Now summarize the results by bin.
count(..., Probability.bin) %>%
#Fill in any missing bins with 0 counts
complete(nesting(!!!ensyms(...)), Probability.bin, fill=list(n = 0))
}

test_data %>% bin_by_p_value(P_value, Comparison, Test_method)

# A tibble: 44 x 4
# Comparison Test_method Probability.bin n
# <chr> <chr> <fct> <dbl>
# 1 WT_vs_Mut1 MannWhitney 0 1
# 2 WT_vs_Mut1 MannWhitney 0.1 1
# 3 WT_vs_Mut1 MannWhitney 0.2 0
# 4 WT_vs_Mut1 MannWhitney 0.3 1
# 5 WT_vs_Mut1 MannWhitney 0.4 1
# 6 WT_vs_Mut1 MannWhitney 0.5 1
# 7 WT_vs_Mut1 MannWhitney 0.6 0
# 8 WT_vs_Mut1 MannWhitney 0.7 0
# 9 WT_vs_Mut1 MannWhitney 0.8 1
#10 WT_vs_Mut1 MannWhitney 0.9 4
# … with 34 more rows

测试手动调用的输出是否存储在res中。

identical(res, test_data %>% bin_by_p_value(P_value, Comparison, Test_method))
#[1] TRUE

关于r - dplyr 编程 : unquote-splicing causes overscope error with complete() and nesting(),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47211743/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com