gpt4 book ai didi

R: (un)reduce 数据框

转载 作者:行者123 更新时间:2023-12-04 14:39:14 35 4
gpt4 key购买 nike

我有以下假数据集。在每天 ( dates ) 的一段时间内,所有元素 ( status ) 的状态 ( id ) 都会被记录下来。

df <- data.frame( id = c(1, 1, 1, 1, 1,  2, 2, 2, 2, 2,  3, 3, 3, 3, 3,  4, 4, 4, 4, 4),
dates = c("2021-01-01",
"2021-01-02",
"2021-01-03",
"2021-01-04",
"2021-01-05",

"2021-01-01",
"2021-01-02",
"2021-01-03",
"2021-01-04",
"2021-01-05",

"2021-01-01",
"2021-01-02",
"2021-01-03",
"2021-01-04",
"2021-01-05",

"2021-01-01",
"2021-01-02",
"2021-01-03",
"2021-01-04",
"2021-01-05"),

status = c("A", "A", "A", "B", "C",
"A", "A", "B", "C", "C",
"A", "B", "C", "D", "E",
"A", "B", "B", "B", "B")
)

> df
id dates status
1 1 2021-01-01 A
2 1 2021-01-02 A
3 1 2021-01-03 A
4 1 2021-01-04 B
5 1 2021-01-05 C
6 2 2021-01-01 A
7 2 2021-01-02 A
8 2 2021-01-03 B
9 2 2021-01-04 C
10 2 2021-01-05 C
11 3 2021-01-01 A
12 3 2021-01-02 B
13 3 2021-01-03 C
14 3 2021-01-04 D
15 3 2021-01-05 E
16 4 2021-01-01 A
17 4 2021-01-02 B
18 4 2021-01-03 B
19 4 2021-01-04 B
20 4 2021-01-05 B
不幸的是,为了节省空间,减少了数据帧:如果在随后的两天中状态相同,则删除第二个条目。假设状态保持不变,直到再次更改,因此实际数据集如下所示:
> df %>% group_by(id) %>%
+ mutate(dupl = duplicated(status, 2)) %>%
+ ungroup() %>%
+ filter(dupl == FALSE) %>%
+ select(-dupl)
# A tibble: 13 x 3
id dates status
<dbl> <chr> <chr>
1 1 2021-01-01 A
2 1 2021-01-04 B
3 1 2021-01-05 C
4 2 2021-01-01 A
5 2 2021-01-03 B
6 2 2021-01-04 C
7 3 2021-01-01 A
8 3 2021-01-02 B
9 3 2021-01-03 C
10 3 2021-01-04 D
11 3 2021-01-05 E
12 4 2021-01-01 A
13 4 2021-01-02 B
我现在的问题是:我怎样才能再次回到数据集的第一个(完整)版本?所有 id 的时间段始终相同s (2021-01-01 至 2021-01-05)

最佳答案

library(tidyverse)

# the reduced version can be created like this instead
df_reduced <- df %>%
mutate(dates = lubridate::ymd(dates)) %>%
distinct(id, status, .keep_all = TRUE)
对于这样的问题,我会查看 tidyr 中的函数相关 missing values .我们可以用expand生成完整的id/dates组合序列,然后填写 NA值与 fill(status, .direction = "down") .
df_reduced %>% 
expand(id, dates = full_seq(dates, 1)) %>%
left_join(df_reduced) %>%
group_by(id) %>%
fill(status, .direction = "down")

#> Joining, by = c("id", "dates")
#> # A tibble: 20 x 3
#> # Groups: id [4]
#> id dates status
#> <dbl> <chr> <chr>
#> 1 1 2021-01-01 A
#> 2 1 2021-01-02 A
#> 3 1 2021-01-03 A
#> 4 1 2021-01-04 B
#> 5 1 2021-01-05 C
#> 6 2 2021-01-01 A
#> 7 2 2021-01-02 A
#> 8 2 2021-01-03 B
#> 9 2 2021-01-04 C
#> 10 2 2021-01-05 C
#> 11 3 2021-01-01 A
#> 12 3 2021-01-02 B
#> 13 3 2021-01-03 C
#> 14 3 2021-01-04 D
#> 15 3 2021-01-05 E
#> 16 4 2021-01-01 A
#> 17 4 2021-01-02 B
#> 18 4 2021-01-03 B
#> 19 4 2021-01-04 B
#> 20 4 2021-01-05 B
创建于 2021-07-06 由 reprex package (v1.0.0)

关于R: (un)reduce 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68266205/

35 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com