gpt4 book ai didi

根据R中的日期变量重新组织多个变量

转载 作者:行者123 更新时间:2023-12-03 20:23:08 25 4
gpt4 key购买 nike

如果我有一个数据集,其中包含在不同时间点收集的相同度量的分数,我如何组织这些日期/时间,以便它们代表某个日期之后的时间点?这是否可以在 R 中执行,或者我在另一个程序中执行此操作会更容易吗?
我有一个当前看起来像这样的数据集:

id  date        score1_date score1  score2_date score2  score3_date score3
101 1/6/2020 1/1/2020 20 1/8/2020 18 1/15/2020 16
102 2/27/2020 2/14/2020 16 2/21/2020 16 2/28/2020 10
103 1/10/2020 1/7/2020 30 1/14/2020 25 1/21/2020 20
104 3/5/2020 3/6/2020 40 3/13/2020 42 3/20/2020 40
我想找到与 [date] 最接近的 [score#_date] 并将其标识为 [time1],然后将其后的所有内容作为 [time2]、[time3] 等。
这是上表的代码:
structure(list(id = c(101, 102, 103, 104), date = structure(c(18267, 
18319, 18271, 18326), class = "Date"), score1_date = structure(c(18262,
18306, 18268, 18327), class = "Date"), score1 = c(20, 16, 30,
40), score2_date = structure(c(18269, 18313, 18275, 18334), class = "Date"),
score2 = c(18, 16, 25, 42), score3_date = structure(c(18276,
18320, 18282, 18341), class = "Date"), score3 = c(16, 10,
20, 40)), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
所以我最终希望数据集有看起来像这样的变量:
id  date        time1_date  time1_score time2_date  time2_score time3_date  time3_score
101 1/6/2020 1/8/2020 18 1/15/2020 16 NA NA
102 2/27/2020 2/28/2020 10 NA NA NA NA
103 1/10/2020 1/7/2020 30 1/14/2020 25 1/21/2020 20
104 3/5/2020 3/6/2020 40 3/13/2020 42 3/20/2020 40
非常感谢!

最佳答案

使用 tidyverse您可以执行的功能:

library(dplyr)
library(tidyr)

df %>%
#Rename date column to base_date
rename(base_date = date) %>%
#Rename score1, score2 etc to score1_value, score2_value etc
rename_with(~paste0(., '_value'), matches('^score\\d+$')) %>%
#get the data in long format with date and value as two columns
pivot_longer(cols = starts_with('score'),
names_to = c('score', '.value'),
names_sep = '_') %>%
group_by(id) %>%
#Keep only those date where the date is greater than closest date
filter(date >= date[which.min(abs(date - base_date))]) %>%
#Arrange the data
arrange(id, date) %>%
#Create new column name
mutate(score = paste0('time', row_number())) %>%
ungroup %>%
#Get the data in wide format
pivot_wider(names_from = score, values_from = c(date, value)) %>%
#Arrange the columns
select(id, base_date, order(suppressWarnings(readr::parse_number(names(.)))))

# id base_date date_time1 value_time1 date_time2 value_time2 date_time3 value_time3
# <dbl> <date> <date> <dbl> <date> <dbl> <date> <dbl>
#1 101 2020-01-06 2020-01-08 18 2020-01-15 16 NA NA
#2 102 2020-02-27 2020-02-28 10 NA NA NA NA
#3 103 2020-01-10 2020-01-07 30 2020-01-14 25 2020-01-21 20
#4 104 2020-03-05 2020-03-06 40 2020-03-13 42 2020-03-20 40

关于根据R中的日期变量重新组织多个变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66978277/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com