gpt4 book ai didi

用 "login" "logout"次 reshape R 中的数据

转载 作者:行者123 更新时间:2023-12-03 20:20:19 24 4
gpt4 key购买 nike

我是 R 新手,我正在为自己的目的做一个附带项目。我有这个数据(这个问题的可重复输入是在问题的末尾):

     X            datetime  user  state
1 1 2016-02-19 19:13:26 User1 joined
2 2 2016-02-19 19:21:18 User2 joined
3 3 2016-02-19 19:21:33 User1 joined
4 4 2016-02-19 19:35:38 User1 joined
5 5 2016-02-19 19:44:15 User1 joined
6 6 2016-02-19 19:48:55 User1 joined
7 7 2016-02-19 19:52:40 User1 joined
8 8 2016-02-19 19:53:15 User3 joined
9 9 2016-02-19 20:02:34 User3 joined
10 10 2016-02-19 20:13:48 User3 joined
19 637 2016-02-19 19:13:32 User1 left
20 638 2016-02-19 19:25:26 User1 left
21 639 2016-02-19 19:30:30 User2 left
22 640 2016-02-19 19:42:16 User1 left
23 641 2016-02-19 19:47:59 User1 left
24 642 2016-02-19 19:51:06 User1 left
25 643 2016-02-19 20:02:26 User3 left

我希望它看起来像这样:
    user  joined                left
1 User1 2016-02-19 19:13:26 2016-02-19 19:13:32
2 User2 2016-02-19 19:21:18 2016-02-19 19:30:30
3 User3 2016-02-19 19:53:15 2016-02-19 20:02:26
4 User1 2016-02-19 19:21:33 2016-02-19 19:25:26
.
.
.

我正在研究 tidyr,因为显然涉及到一些 reshape ,但我无法理解到底需要做什么。这甚至可能吗(没有循环/大量的程序代码)?我无法理解如何解决的问题是没有办法知道特定的“左”记录应该连接到特定的“连接”记录。我可以找到的示例都涉及收集其他值的静态月份或日期。我应该补充一点,不一定保证所有记录都具有“左”值(用户可能仍被“加入”)。

这是数据样本的 dput 输出:
> dput(samp)
structure(list(X = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 637L, 638L, 639L, 640L,
641L, 642L, 643L, 644L, 645L, 646L, 647L, 648L, 649L, 650L, 651L
), datetime = structure(c(1L, 3L, 4L, 7L, 9L, 11L, 13L, 14L,
16L, 18L, 21L, 22L, 23L, 26L, 27L, 30L, 32L, 33L, 2L, 5L, 6L,
8L, 10L, 12L, 15L, 17L, 19L, 20L, 24L, 25L, 28L, 29L, 31L), .Label = c("2016-02-19 19:13:26",
"2016-02-19 19:13:32", "2016-02-19 19:21:18", "2016-02-19 19:21:33",
"2016-02-19 19:25:26", "2016-02-19 19:30:30", "2016-02-19 19:35:38",
"2016-02-19 19:42:16", "2016-02-19 19:44:15", "2016-02-19 19:47:59",
"2016-02-19 19:48:55", "2016-02-19 19:51:06", "2016-02-19 19:52:40",
"2016-02-19 19:53:15", "2016-02-19 20:02:26", "2016-02-19 20:02:34",
"2016-02-19 20:13:38", "2016-02-19 20:13:48", "2016-02-19 20:42:27",
"2016-02-19 20:48:22", "2016-02-19 20:49:31", "2016-02-19 20:59:58",
"2016-02-19 21:06:20", "2016-02-19 21:10:43", "2016-02-19 21:11:13",
"2016-02-19 21:11:15", "2016-02-19 21:11:22", "2016-02-19 21:17:33",
"2016-02-19 22:02:45", "2016-02-19 22:05:18", "2016-02-19 22:05:37",
"2016-02-19 22:05:47", "2016-02-19 22:30:30"), class = "factor"),
user = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L,
3L, 4L, 1L, 1L, 4L, 4L, 4L, 3L, 1L, 1L, 2L, 1L, 1L, 1L, 3L,
3L, 3L, 1L, 4L, 1L, 1L, 4L, 4L), .Label = c("User1", "User2",
"User3", "User4"), class = "factor"), state = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("joined", "left"), class = "factor")), .Names = c("X",
"datetime", "user", "state"), class = "data.frame", row.names = c(NA,
-33L))

最佳答案

由于 tidyr 1.0.0 以下是可能的:

suppressPackageStartupMessages(library(tidyverse))
pivot_wider(samp[-1], names_from = "state", values_from = "datetime",
values_fn = list(datetime = list)) %>%
mutate(left = map2(left, lengths(joined),`length<-`)) %>%
unchop(everything())

#> # A tibble: 18 x 3
#> user joined left
#> <fct> <fct> <fct>
#> 1 User1 2016-02-19 19:13:26 2016-02-19 19:13:32
#> 2 User1 2016-02-19 19:21:33 2016-02-19 19:25:26
#> 3 User1 2016-02-19 19:35:38 2016-02-19 19:42:16
#> 4 User1 2016-02-19 19:44:15 2016-02-19 19:47:59
#> 5 User1 2016-02-19 19:48:55 2016-02-19 19:51:06
#> 6 User1 2016-02-19 19:52:40 2016-02-19 20:48:22
#> 7 User1 2016-02-19 21:06:20 2016-02-19 21:11:13
#> 8 User1 2016-02-19 21:11:15 2016-02-19 21:17:33
#> 9 User2 2016-02-19 19:21:18 2016-02-19 19:30:30
#> 10 User3 2016-02-19 19:53:15 2016-02-19 20:02:26
#> 11 User3 2016-02-19 20:02:34 2016-02-19 20:13:38
#> 12 User3 2016-02-19 20:13:48 2016-02-19 20:42:27
#> 13 User3 2016-02-19 20:49:31 NA
#> 14 User3 2016-02-19 22:30:30 NA
#> 15 User4 2016-02-19 20:59:58 2016-02-19 21:10:43
#> 16 User4 2016-02-19 21:11:22 2016-02-19 22:02:45
#> 17 User4 2016-02-19 22:05:18 2016-02-19 22:05:37
#> 18 User4 2016-02-19 22:05:47 NA
  • values_fn设置为在列表中存储给定用户的多个值
  • 因为它们的长度不同,我们使用 mutate 完成带有 NA 的短的。和 length<-
  • 然后我们使用 unchop 垂直取消嵌套。
  • 关于用 "login" "logout"次 reshape R 中的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35932291/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com