gpt4 book ai didi

r - 在 data.table 上按周期分组重复

转载 作者:行者123 更新时间:2023-12-03 13:52:30 25 4
gpt4 key购买 nike

我有一个包含名称、日期和几个分类列的数据集。让我们说

data <- data.table(name = c('Anne', 'Ben', 'Cal', 'Anne', 'Ben', 'Cal', 'Anne', 'Ben', 'Ben', 'Ben', 'Cal'),
period = c(1,1,1,1,1,1,2,2,2,3,3),
category = c("A","A","A","B","B","B","A","B","A","B","A"))
看起来像这样:
  name  period  category
Anne 1 A
Ben 1 A
Cal 1 A
Anne 1 B
Ben 1 B
Cal 1 B
Anne 2 A
Ben 2 B
Ben 2 A
Ben 3 A
Cal 3 B
我想计算,对于每个时期,对于我的每组分类变量,过去时期存在多少个名字。输出应如下所示:
period  category  recurrence_count
2 A 2 # due to Anne and Ben being on A, period 1
2 B 1 # due to Ben being on B, period 1
3 A 1 # due to Ben being on A, period 2
3 B 0 # no match from B, period 2
我知道 data.table 中的 .I 和 .GRP 运算符,但我不知道如何在语句的 j 条目中编写“下一组”的概念。我想像这样的事情可能是一条合理的路径,但我无法弄清楚正确的语法:
data[, .(recurrence_count = length(intersect(name, name[last(.GRP)]))), by = .(category, period)]

最佳答案

您可以首先按类别和期间汇总您的数据。

previous_period_names <- data[, .(names = list(name)), .(category, period)]

previous_period_names[, next_period := period + 1]
将您的摘要与您的原始数据结合起来。
data[previous_period_names, names := i.names, on = c('period==next_period')]
现在计算您在汇总名称中看到的名称数量
data[, .(recurrence_count = sum(name %in% unlist(names))), by = .(period, category)]

关于r - 在 data.table 上按周期分组重复,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66772154/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com