gpt4 book ai didi

避免嵌套 ifelse 语句的规则表

转载 作者:行者123 更新时间:2023-12-04 01:36:08 28 4
gpt4 key购买 nike

想法是使用可管理的方法从一些表中定义规则:

library(data.table)

a <- data.table(rule = c("rule1", "rule2", "rule3"),
bool = c(T,T,F))

a
# rule bool
# 1: rule1 TRUE
# 2: rule2 TRUE
# 3: rule3 FALSE

ifelse(a[rule == "rule1", bool] & a[rule == "rule2", bool] & a[rule == "rule3", bool], 1,
ifelse(a[rule == "rule1", bool] & a[rule == "rule2", bool], 2,
ifelse(a[rule == "rule2", bool] & a[rule == "rule3", bool], 3, 4)))
# [1] 2

显然,随着我不断添加规则,这不是很可持续或可读。这里有什么可以替代 ifelse

最佳答案

这是一个非常有趣的问题,特别是当条件并不总是涉及 a 的所有行时,即 rule1rule2 , 和 规则 3

我试图找到一个通用的解决方案,它可以针对任意数量的条件以及 a 中的额外行进行扩展。

主要思想是将嵌套的 ifelse()case_when() 语句中的条件分别替换为 data.table 然后可以以某种方式与 a 连接:

library(data.table)
b <- fread(
"rule1, rule2, rule3, result
TRUE, TRUE, TRUE, 1
TRUE, TRUE, NA, 2
NA, TRUE, TRUE, 3
NA, NA, NA, 4"
)

例如,第 2 行的条件指定如果 rule1rule2 都为 TRUE 则返回 2rule3 的值无关紧要,可以作为通配符 忽略。

请务必注意条件的顺序:首先,必须检查没有任何通配符 的条件。然后,带有一个通配符 的条件,依此类推。最后,如果没有找到其他匹配项,则应用默认值(所有通配符)。默认值必须始终在最后一行给出。
因此,最特殊的条件排在最前面,最一般的条件排在最后。

OP 已经给出了长格式的测试数据a:

    rule  bool
1: rule1 TRUE
2: rule2 TRUE
3: rule3 FALSE

因此,条件 b 也被 reshape 为长格式:

lb <- melt(b[, id := .I], c("id", "result"), variable.name = "rule", value.name = "bool", na.rm = TRUE)[
, nr := .N, by = id][]

lb
   id result  rule bool nr
1: 1 1 rule1 TRUE 3
2: 2 2 rule1 TRUE 2
3: 1 1 rule2 TRUE 3
4: 2 2 rule2 TRUE 2
5: 3 3 rule2 TRUE 2
6: 1 1 rule3 TRUE 3
7: 3 3 rule3 TRUE 2

在 reshape 之前,添加了一行 id 来指示条件的顺序。 通配符 从长格式中省略,因为连接不需要它们。整形后,附加每个id的剩余行数nr,即非通配符条目的数量。

现在,测试条件:

answer <- lb[a, on = .(rule, bool), nomatch = 0L][
, result[nr == .N], by = .(nr, id)][
order(-nr, id), first(V1)]
if (length(answer) == 0L) answer <- b[id == max(id), result] # default
answer

这分为四个步骤:

  1. albrulebool 上连接(内部连接),
  2. 由于要加入的数据是长格式的,不完整的条件通过检查每个id的条件数来移除(nr包含在中= 子句只是为了方便,因为下一步需要它),
  3. 其余行按顺序从最专业的条件中挑选第一个结果
  4. 如果上述操作没有返回answer,则返回默认值。

对于给定的a,上面的代码返回

answer
[1] 2

更多测试用例

要验证以上代码是否正常工作,需要进行更彻底的测试

test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), sorted = FALSE)
test
   rule1 rule2 rule3
1: TRUE TRUE TRUE
2: TRUE TRUE FALSE
3: TRUE FALSE TRUE
4: TRUE FALSE FALSE
5: FALSE TRUE TRUE
6: FALSE TRUE FALSE
7: FALSE FALSE TRUE
8: FALSE FALSE FALSE

每一行代表一个版本的a,由

将其转换为OP的长格式
a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool")

通过遍历 i,可以测试 TRUE/FALSE 值的所有可能组合。此外还打印了一些有助于理解工作原理的中间结果:

library(magrittr) # piping used here to improve readability
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), sorted = FALSE)
for (i in seq(nrow(test))) {
cat("test case", i, "\n")
a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool") %T>%
print()
lb[a, on = .(rule, bool), nomatch = 0L][, result[nr == .N], keyby = .(nr, id)] %>%
unique() %>%
print() # intermediate result printed for illustration
answer <- lb[a, on = .(rule, bool), nomatch = 0L][
, result[nr == .N], by = .(nr, id)][
order(-nr, id), first(V1)]
if (length(answer) == 0L) answer <- b[id == max(id), result] # default from b
cat("answer = ", answer, "\n\n")
}
test case 1 
rule bool
1: rule1 TRUE
2: rule2 TRUE
3: rule3 TRUE
nr id V1
1: 2 2 2
2: 2 3 3
3: 3 1 1
answer = 1

test case 2
rule bool
1: rule1 TRUE
2: rule2 TRUE
3: rule3 FALSE
nr id V1
1: 2 2 2
answer = 2

test case 3
rule bool
1: rule1 TRUE
2: rule2 FALSE
3: rule3 TRUE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer = 4

test case 4
rule bool
1: rule1 TRUE
2: rule2 FALSE
3: rule3 FALSE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer = 4

test case 5
rule bool
1: rule1 FALSE
2: rule2 TRUE
3: rule3 TRUE
nr id V1
1: 2 3 3
answer = 3

test case 6
rule bool
1: rule1 FALSE
2: rule2 TRUE
3: rule3 FALSE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer = 4

test case 7
rule bool
1: rule1 FALSE
2: rule2 FALSE
3: rule3 TRUE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer = 4

test case 8
rule bool
1: rule1 FALSE
2: rule2 FALSE
3: rule3 FALSE
Empty data.table (0 rows and 3 cols): nr,id,V1
answer = 4

从回答中可以看出,给定的条件都满足了。

测试用例 1 值得仔细研究。此处,条件 id 1、2 和 3 可能适用,但条件 1 优先于其他条件,因为它是最专业的。

展开

这是为了表明解决方案可以针对 a 中的更多规则以及 b 中的更多条件进行扩展。

这是一个包含 7 个条件和 4 个规则列的示例。

b4 <- fread(
"rule1, rule2, rule3, rule4, result
TRUE, TRUE, TRUE, TRUE, 1
TRUE, TRUE, NA, NA, 2
NA, TRUE, TRUE, NA, 3
NA, FALSE, NA, NA, 5
TRUE, FALSE, NA, NA, 6
FALSE, FALSE, NA, FALSE, 7
NA, NA, NA, NA, 4"
)

测试代码已经过简化,可以更紧凑地查看 16 个文本案例:

lb <- melt(b4[, id := .I], c("id", "result"), variable.name = "rule", value.name = "bool", na.rm = TRUE)[, nr := .N, by = id][]
test <- CJ(rule1 = c(TRUE, FALSE), rule2 = c(TRUE, FALSE), rule3 = c(TRUE, FALSE), rule4 = c(TRUE, FALSE), sorted = FALSE)
sapply(
seq(nrow(test)),
function(i) {
a <- melt(test[i], measure.vars = patterns("^rule"), variable.name = "rule", value.name = "bool")
answer <- lb[a, on = .(rule, bool), nomatch = 0L][, result[nr == .N], by = .(nr, id)][order(-nr, id), first(V1)]
if (length(answer) == 0L) answer <- b4[id == max(id), result] # default from b
return(answer)
}
) %>%
cbind(test, .) %>%
setnames(".", "result") %>%
print()

它返回测试用例表,即宽格式的 a 的不同用例,并附加结果:

    rule1 rule2 rule3 rule4 result
1: TRUE TRUE TRUE TRUE 1
2: TRUE TRUE TRUE FALSE 2
3: TRUE TRUE FALSE TRUE 2
4: TRUE TRUE FALSE FALSE 2
5: TRUE FALSE TRUE TRUE 6
6: TRUE FALSE TRUE FALSE 6
7: TRUE FALSE FALSE TRUE 6
8: TRUE FALSE FALSE FALSE 6
9: FALSE TRUE TRUE TRUE 3
10: FALSE TRUE TRUE FALSE 3
11: FALSE TRUE FALSE TRUE 4
12: FALSE TRUE FALSE FALSE 4
13: FALSE FALSE TRUE TRUE 5
14: FALSE FALSE TRUE FALSE 7
15: FALSE FALSE FALSE TRUE 5
16: FALSE FALSE FALSE FALSE 7

关于避免嵌套 ifelse 语句的规则表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59499043/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com