gpt4 book ai didi

r - 如何检测 R 中数据框中给定引用变量下方和上方的最接近值?

转载 作者:行者123 更新时间:2023-12-05 08:24:50 25 4
gpt4 key购买 nike

考虑以下随机 MWE。

对于每一行,我试图确定哪个变量的值最接近常量 reference_day 以及哪个变量的值最接近常量 reference_day。 p>

df1 <-
structure(
list(id = 1:5,
gender = c("female", "male", "male", "male", "male"),
reference_day = structure(c(18052, NA, 18052, 18052, 18052), class = "Date"),
var1 = structure(c(16505, 17144, 18139, NA, 16639), class = "Date"),
var2 = structure(c(NA, 18042, 16544, 16697, NA), class = "Date"),
var3 = structure(c(17845, 18070, 17152, 16571, NA), class = "Date")),
row.names = c(NA, -5L), class = "data.frame")

df1

id gender reference_day var1 var2 var3
1 1 female 2019-06-05 2015-03-11 <NA> 2018-11-10
2 2 male <NA> 2016-12-09 2019-05-26 2019-06-23
3 3 male 2019-06-05 2019-08-31 2015-04-19 2016-12-17
4 4 male 2019-06-05 <NA> 2015-09-19 2015-05-16
5 5 male 2019-06-05 2015-07-23 <NA> <NA>

我要的结果是这样的:

  id gender reference_day       var1       var2       var3 closest_to_left closest_to_right
1 1 female 2019-06-05 2015-03-11 <NA> 2018-11-10 var3 <NA>
2 2 male <NA> 2016-12-09 2019-05-26 2019-06-23 <NA> <NA>
3 3 male 2019-06-05 2019-08-31 2015-04-19 2016-12-17 var3 var1
4 4 male 2019-06-05 <NA> 2015-09-19 2015-05-16 var2 <NA>
5 5 male 2019-06-05 2015-07-23 <NA> <NA> var1 <NA>

经过多次尝试和错误后,我实际上能够使用 dplyr 的 case_when 函数找到解决方案,但它需要大量的样板代码,这让我认为必须有一个更聪明的方法解决方案。

我个人更喜欢使用 dplyr,但非常感谢任何帮助。

最佳答案

执行此操作的自定义函数 -

library(dplyr)

cols <- df1 %>% select(starts_with('var')) %>% names

closest_to_right <- function(x, y) {
tmp <- y - x
if(any(tmp > 0, na.rm = TRUE))
cols[tmp %in% min(tmp[tmp > 0], na.rm = TRUE)] else NA
}

closest_to_left <- function(x, y) {
tmp <- y - x
if(any(tmp < 0, na.rm = TRUE))
cols[tmp %in% max(tmp[tmp < 0], na.rm = TRUE)] else NA
}

df1 %>%
rowwise() %>%
mutate(closest_to_left = closest_to_left(reference_day, c_across(starts_with('var'))),
closest_to_right = closest_to_right(reference_day, c_across(starts_with('var')))) %>%
ungroup

# id gender reference_day var1 var2 var3 closest_to_left closest_to_right
# <int> <chr> <date> <date> <date> <date> <chr> <chr>
#1 1 female 2019-06-05 2015-03-11 NA 2018-11-10 var3 NA
#2 2 male NA 2016-12-09 2019-05-26 2019-06-23 NA NA
#3 3 male 2019-06-05 2019-08-31 2015-04-19 2016-12-17 var3 var1
#4 4 male 2019-06-05 NA 2015-09-19 2015-05-16 var2 NA
#5 5 male 2019-06-05 2015-07-23 NA NA var1 NA

关于r - 如何检测 R 中数据框中给定引用变量下方和上方的最接近值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71006802/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com