I have data frame with 18 columns. Columns 2 to 13 include numeric values such as 0, 1, 2, 4 ... I want to recode them based on range into three categories:
我有18列的数据框。列2到13包括数值,如0、1、2、4...我想根据范围将它们重新编码为三类:
if columns 2:13 are 0 -> 0
if columns 2:13 between 1 & 5 -> 1
else columns 2:13 >- 2.
My attempt works, but not efficient:
我的尝试奏效了,但效率不高:
df[,2:13][df[,2:13] == 1 | df[,2:13] == 2 | df[,2:13] == 3 | df[,2:13] == 4 | df[,2:13] == 5] <- 1
I appreciate your help.
我很感谢你的帮助。
更多回答
优秀答案推荐
Try findInterval
:
尝试findInterval:
dplyr
library(dplyr)
df %>%
mutate(
across(2:13, ~ findInterval(., c(0, 1, 5), rightmost.closed = TRUE) - 1L)
)
If this gets any more complex (such as non-consecutive recoded values), we might switch to case_when
:
如果这变得更加复杂(例如非连续的重新编码值),我们可能会切换到CASE_WHEN:
df %>%
mutate(
across(2:13, ~ case_when(
. == 0 ~ 0L,
between(., 1, 5) ~ 1L,
TRUE ~ 2L
))
)
base R
df[,2:13] <- lapply(df[,2:13], function(z) findInterval(z, c(0, 1, 5), rightmost.closed = TRUE) - 1L)
更多回答
Awesome! Thank you so much. I was not aware of findInterval function.
太棒了!非常感谢。我不知道findInterval函数。
It's very similar to cut
, useful for returning strings/labels (including number-range looking things)
它非常类似于Cut,对于返回字符串/标签非常有用(包括查找数字范围的内容)
我是一名优秀的程序员,十分优秀!