gpt4 book ai didi

r - 如何为数据框中的行对设置 id?

转载 作者:行者123 更新时间:2023-12-04 12:31:54 27 4
gpt4 key购买 nike

我有一个如下所示的数据框:

   File                          Time Behavior Status   diff    id
<chr> <dbl> <chr> <chr> <dbl> <int>
1 K8053121_serial-food-depr_04 389. Protrude START 6.25 1
2 K8053121_serial-food-depr_04 409. Protrude STOP 3.25 1
3 K8060221_serial-food-depr_01 669. Protrude START 19.0 1
4 K8060221_serial-food-depr_01 757. Protrude STOP 0.247 1
5 K8060221_serial-food-depr_01 864. Protrude START 8.00 1
6 K8060221_serial-food-depr_01 929. Protrude STOP 0 1
7 K8060221_serial-food-depr_02 477. Protrude START 25.0 1
8 K8060221_serial-food-depr_02 502. Protrude STOP 2.00 1
9 K8060221_serial-food-depr_02 562. Protrude START 22.7 1
10 K8060221_serial-food-depr_02 570. Protrude STOP 5.50 1
11 K8060221_serial-food-depr_02 924. Protrude START 18.3 1
12 K8060221_serial-food-depr_02 958. Protrude STOP 0 1
13 K8060221_serial-food-depr_04 215. Protrude START 5.93 1
14 K8060221_serial-food-depr_04 283. Protrude STOP 0 1
15 K8060221_serial-food-depr_04 291. Protrude START 0.25 1

这里是 dput 输出:

structure(list(File = c("K8053121_serial-food-depr_04", "K8053121_serial-food-depr_04", 
"K8060221_serial-food-depr_01", "K8060221_serial-food-depr_01",
"K8060221_serial-food-depr_01", "K8060221_serial-food-depr_01",
"K8060221_serial-food-depr_02", "K8060221_serial-food-depr_02",
"K8060221_serial-food-depr_02", "K8060221_serial-food-depr_02",
"K8060221_serial-food-depr_02", "K8060221_serial-food-depr_02",
"K8060221_serial-food-depr_04", "K8060221_serial-food-depr_04",
"K8060221_serial-food-depr_04"), Time = c(388.936, 408.683, 668.534,
757.371, 863.721, 929.222, 477.278, 501.845, 561.649, 569.901,
923.537, 957.571, 214.577, 283.075, 291.077), Behavior = c("Protrude",
"Protrude", "Protrude", "Protrude", "Protrude", "Protrude", "Protrude",
"Protrude", "Protrude", "Protrude", "Protrude", "Protrude", "Protrude",
"Protrude", "Protrude"), Status = c("START", "STOP", "START",
"STOP", "START", "STOP", "START", "STOP", "START", "STOP", "START",
"STOP", "START", "STOP", "START"), diff = c(6.24899999999997,
3.24700000000001, 19.0169999999999, 0.246999999999957, 7.99800000000005,
0, 24.956, 1.99900000000002, 22.749, 5.50099999999998, 18.2660000000001,
0, 5.92500000000001, 0, 0.25), id = c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA, -15L), groups = structure(list(
File = c("K8053121_serial-food-depr_04", "K8060221_serial-food-depr_01",
"K8060221_serial-food-depr_02", "K8060221_serial-food-depr_04"
), .rows = structure(list(1:2, 3:6, 7:12, 13:15), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))

我正在尝试生成一个大致如下所示的数据框:

  File                         Behavior    id START  STOP duration
<chr> <chr> <int> <dbl> <dbl> <dbl>
1 K8053121_serial-food-depr_04 Protrude 1 389. 409. 19.7
2 K8060221_serial-food-depr_01 Protrude 1 766. 843. 77.2
3 K8060221_serial-food-depr_02 Protrude 1 654. 676. 22.3
4 K8060221_serial-food-depr_04 Protrude 1 432. 464. 32.0

除了这个数据框将一个行为的多个实例折叠成一行而不是将它们分开。例如上面的数据框应该是:

  File                         Behavior    id START  STOP duration
<chr> <chr> <int> <dbl> <dbl> <dbl>
1 K8053121_serial-food-depr_04 Protrude 1 389. 409. 20
2 K8060221_serial-food-depr_01 Protrude 1 669. 757. 88
3 K8060221_serial-food-depr_01 Protrude 2 864. 929. 65
4 K8060221_serial-food-depr_02 Protrude 1 477. 502. 25
5 K8060221_serial-food-depr_02 Protrude 2 562. 570. 8
6 K8060221_serial-food-depr_02 Protrude 3 924. 958. 34

等等……

这是我尝试过的:

protrude_data <- subset(boris_df, Behavior == "Protrude") %>%
mutate(id = rleid(Behavior)) %>%
group_by(File,id) %>%
pivot_wider(id_cols = c("File","Behavior", "id"),
names_from = "Status",
values_from = "Time",
values_fn = list(Time = mean)) %>%
mutate(duration = STOP - START)

此方法适用于前面的示例,因为我有不同的行为,所以它们的编号不同,我不确定如何修改 id 以执行我想要的操作。

运行这些行的中间步骤:

protrude_data <- subset(boris_df, Behavior == "Protrude") %>%
mutate(id = rleid(Behavior))

是:

File                          Time Behavior Status   diff    id
<chr> <dbl> <chr> <chr> <dbl> <int>
1 K8053121_serial-food-depr_04 389. Protrude START 6.25 1
2 K8053121_serial-food-depr_04 409. Protrude STOP 3.25 1
3 K8060221_serial-food-depr_01 669. Protrude START 19.0 1
4 K8060221_serial-food-depr_01 757. Protrude STOP 0.247 1
5 K8060221_serial-food-depr_01 864. Protrude START 8.00 1
6 K8060221_serial-food-depr_01 929. Protrude STOP 0 1

我希望它看起来像:

File                          Time Behavior Status   diff    id
<chr> <dbl> <chr> <chr> <dbl> <int>
1 K8053121_serial-food-depr_04 389. Protrude START 6.25 1
2 K8053121_serial-food-depr_04 409. Protrude STOP 3.25 1
3 K8060221_serial-food-depr_01 669. Protrude START 19.0 1
4 K8060221_serial-food-depr_01 757. Protrude STOP 0.247 1
5 K8060221_serial-food-depr_01 864. Protrude START 8.00 2
6 K8060221_serial-food-depr_01 929. Protrude STOP 0 2

等等……

最佳答案

您可以使用 cumsum 为每个 'START' 递增 id 值。

library(dplyr)
library(tidyr)

df %>%
filter(Behavior == "Protrude") %>%
mutate(id = cumsum(Status == 'START')) %>%
pivot_wider(id_cols = c(File,Behavior, id),
names_from = Status,
values_from = Time,
values_fn = list(Time = mean)) %>%
mutate(duration = STOP - START) %>%
ungroup

# File Behavior id START STOP duration
# <chr> <chr> <int> <dbl> <dbl> <dbl>
#1 K8053121_serial-food-depr_04 Protrude 1 389. 409. 19.7
#2 K8060221_serial-food-depr_01 Protrude 1 669. 757. 88.8
#3 K8060221_serial-food-depr_01 Protrude 2 864. 929. 65.5
#4 K8060221_serial-food-depr_02 Protrude 1 477. 502. 24.6
#5 K8060221_serial-food-depr_02 Protrude 2 562. 570. 8.25
#6 K8060221_serial-food-depr_02 Protrude 3 924. 958. 34.0
#7 K8060221_serial-food-depr_04 Protrude 1 215. 283. 68.5
#8 K8060221_serial-food-depr_04 Protrude 2 291. NA NA

关于r - 如何为数据框中的行对设置 id?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68506479/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com