gpt4 book ai didi

r - 将相同的因子水平应用于 R 中具有不同水平数量的多个变量

转载 作者:行者123 更新时间:2023-12-04 08:17:39 24 4
gpt4 key购买 nike

我有一个 data.table168 variables8,278 observations .变量 69:135最初存储为字符串。他们应该成为区域假人,我想以 2 级(=是,公司在这里运营)和 1 级(=否,公司不在这里运营)结束。问题是原始变量中有三种不同的输入组合:1) "TRUE", "1", "0", "FALSE", 2) "TRUE", "FALSE", 和 3) "1",“0”。此外,约。 5 个变量只有一个值,“0”或“1”。这里给出了一个例子:

#generating replicable data
structure(list(
region1 = structure(c("TRUE", "FALSE", "0", "1", NA), class = "character"),
region2 = structure(c("1", "1", "0", NA, NA), class = "character"),
region3 = structure(c(NA, "FALSE", "TRUE", NA, "FALSE"), class = "character"),
region4 = structure(c(NA, "0", "0", NA, "0"), class = "character")),
.Names = c("region1", "region2", "region3", "region4"), row.names = c(NA, 5), class = "data.table")

#this gives:
# region1 region2 region3 region4
#1 TRUE 1 <NA> <NA>
#2 FALSE 1 FALSE 0
#3 0 0 TRUE 0
#4 1 <NA> <NA> <NA>
#5 <NA> <NA> FALSE 0
我正在寻找一种方法来一次性将所有变量的“TRUE”和“1”替换为 2,将“FALSE”和“0”替换为 1。所以想要的结果是:
#   region1 region2 region3 region4
#1: 2 2 NA NA
#2: 1 2 1 1
#3: 1 1 2 1
#4: 2 NA NA NA
#5: NA NA 1 1
我已经看过了
Apply factor levels to multiple columns with missing factor levels

Change level of multiple factor variables .
但是,这对我没有帮助。
我使用嵌套的 ifelse() 尝试了以下操作命令:
library(data.table)
library(forcats)

check <- cbind(dt[1:68], as.data.table(apply(dt[69:135], 2, function(x) {
ifelse("1" %in% x & "TRUE" %in% x,
fct_collapse(x,
"2" = c("TRUE",
"1"),
"1" = c("FALSE",
"0")
),
ifelse("1" %in% x & !("TRUE" %in% x),
fct_collapse(x,
"2" = "1",
"1" = "0"),
fct_collapse(x,
"2" = "TRUE",
"1" = "FALSE"
)))
}
)), dt[136:168])
但是之前的代码没有给我想要的结果。它运行了,但我收到一条警告消息,并且在检查相应的变量时,它们仍然作为带有原始输入的字符串存储。
# examples of warnings
1: Unknown levels in `f`: TRUE, FALSE
2: Unknown levels in `f`: TRUE, FALSE
3: Unknown levels in `f`: TRUE, FALSE
4: Unknown levels in `f`: 0
5: Unknown levels in `f`: TRUE, FALSE
6: Unknown levels in `f`: TRUE, FALSE
7: Unknown levels in `f`: 0
单独使用和不与 fct_collapse 结合使用时也是如此嵌套 ifelse()命令完成这项工作:
#the ifelse statement works
ifelse("TRUE" %in% dt$region1, 2, "FALSE")
ifelse(5 %in% dt$region1, 2, "FALSE")

#also the nested ifelse statement works
ifelse("1" %in% dt$region1 & "TRUE" %in% dt$region1,
0,
ifelse("1" %in% dt$region1 & !("TRUE" %in% dt$region1),
1,
2
))


ifelse("1" %in% dt$region2 & "TRUE" %in% dt$region2,
0,
ifelse("1" %in% dt$region2 & !("TRUE" %in% dt$region2),
1,
2
))
有谁知道如何解决这个问题?
非常感谢您提前提供任何建议!

最佳答案

这是 set() 的方法在 for 中调用环形。

library(data.table)

f <- function(x){
x <- as.character(x)
i1 <- x %in% c("TRUE", "1")
i0 <- x %in% c("FALSE", "0")
x[which(i1)] <- "2"
x[which(i0)] <- "1"
as.integer(x)
}

for (j in seq_along(dt)) set(dt, j = j, value = f(dt[[j]]))

dt
# region1 region2 region3 region4
#1: 2 2 NA NA
#2: 1 2 1 1
#3: 1 1 2 1
#4: 2 NA NA NA
#5: NA NA 1 1

感谢 jangorecki's comment一个更简单的方法是
dt[, names(dt) := lapply(dt, f)]

关于r - 将相同的因子水平应用于 R 中具有不同水平数量的多个变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65645399/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com