gpt4 book ai didi

R dplyr : Non-Standard Evaluation difficulty. 想在过滤器和变异中使用动态变量名

转载 作者:行者123 更新时间:2023-12-04 11:25:31 25 4
gpt4 key购买 nike

我创建了一个可重现的示例来说明我在 R (dplyr) 中使用非标准评估遇到的问题。我想在下面的场景中使用动态变量名:

# Given a data frame of patient data, I need to find records containing date logic errors.
# My datasets are enormous but here is a tiny example

patientData <- data.frame(
patientID = 1:20,
birth_d = seq(as.Date("2010-01-01"),by = 90, length.out = 20),
treat_d = seq(as.Date("2011-01-01"),by = 90, length.out = 20),
death_d = seq(as.Date("2012-01-01"),by = 90, length.out = 20)
)

# To create some random records that will be in error (death_d before birth_d, birth_d after treat_d, etc):

patientData$birth_d[5] <- as.Date("2017-01-01")
patientData$death_d[7] <- as.Date("2001-01-01")
patientData$treat_d[10] <- as.Date("2018-01-01")
patientData$birth_d[12] <- as.Date("2018-05-05")

# To determine which records have birth_d after death_d I could do the following:

badRecords <- patientData %>% filter(death_d < birth_d)

OR

badRecords <- patientData %>% mutate(dateDiff = death_d - birth_d) %>% filter(dateDiff < 0)

# But in my large application (with lots and lots of date variables)
# I want to be able to use the date field names as *variables* and, using one date pair at a time,
# determine which records have dates out of sequence. For example,

firstDateName <- "birth_d"
secondDateName <- "death_d"

# I would like to do this, but it doesn't work
badRecords <- patientData %>% filter(!!firstDateName > !!secondDateName)

# This doesn't work...
badRecords <- patientData %>% mutate(dateDiff = !!secondDateName - !!firstDateName) %>% filter(dateDiff < 0)

# Neither does this... it creates a dateDiff data frame.. with 20 duplicate records
badRecords <- patientData %>% mutate(dateDiff = .[secondDateName] - .[firstDateName]) %>% filter(dateDiff < 0)

`

最佳答案

1) rlang 使用 sym像这样:

library(dplyr)
library(rlang)

firstDateName <- sym("birth_d")
secondDateName <- sym("death_d")
badRecords <- patientData %>% filter(!!firstDateName > !!secondDateName)

给予:
> badRecords
patientID birth_d treat_d death_d
1 5 2017-01-01 2011-12-27 2012-12-26
2 7 2011-06-25 2012-06-24 2001-01-01
3 12 2018-05-05 2013-09-17 2014-09-17

2) 基础 R 或在基数 R 中:
firstDateName <- "birth_d"
secondDateName <- "death_d"
is.bad <- patientData[[firstDateName]] > patientData[[secondDateName]]
badRecords <- patientData[is.bad, ]

2a) 子集 另一个基本解决方案是将上面的最后两行替换为:
subset(patientData, get(firstDateName) > get(secondDateName))

关于R dplyr : Non-Standard Evaluation difficulty. 想在过滤器和变异中使用动态变量名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51772942/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com