gpt4 book ai didi

r - 使用分组计算过去和 future 特定事件的发生次数

转载 作者:行者123 更新时间:2023-12-05 00:53:12 24 4
gpt4 key购买 nike

这个问题是我发布的一个问题的修改here我在不同的日子出现了特定类型的事件,但这次它们被分配给多个用户,例如:

df = data.frame(user_id = c(rep(1:2, each=5)),
cancelled_order = c(rep(c(0,1,1,0,0), 2)),
order_date = as.Date(c('2015-01-28', '2015-01-31', '2015-02-08', '2015-02-23', '2015-03-23',
'2015-01-25', '2015-01-28', '2015-02-06', '2015-02-21', '2015-03-26')))


user_id cancelled_order order_date
1 0 2015-01-28
1 1 2015-01-31
1 1 2015-02-08
1 0 2015-02-23
1 0 2015-03-23
2 0 2015-01-25
2 1 2015-01-28
2 1 2015-02-06
2 0 2015-02-21
2 0 2015-03-26

我想计算

1)每个客户的取消订单数量 将有 在接下来的 x 天(例如 7、14), 排除当前一和

1)每个客户的取消订单数量 在过去 x 天(例如 7、14), 排除当前 .

所需的输出如下所示:
solution
user_id cancelled_order order_date plus14 minus14
1 0 2015-01-28 2 0
1 1 2015-01-31 1 0
1 1 2015-02-08 0 1
1 0 2015-02-23 0 0
1 0 2015-03-23 0 0
2 0 2015-01-25 2 0
2 1 2015-01-28 1 0
2 1 2015-02-06 0 1
2 0 2015-02-21 0 0
2 0 2015-03-26 0 0

solution @joel.wilson 使用 data.table 提出了非常适合此目的的方法。
library(data.table)
vec <- c(14, 30) # Specify desired ranges
setDT(df)[, paste0("x", vec) :=
lapply(vec, function(i) sum(df$cancelled_order[between(df$order_date,
order_date,
order_date + i, # this part can be changed to reflect the past date ranges
incbounds = FALSE)])),
by = order_date]

但是,它不考虑按 user_id 分组.当我尝试通过将此分组添加为 by = c("user_id", "order_date") 来修改公式时或 by = list(user_id, order_date) , 这没用。似乎这是非常基本的东西,有关如何解决此细节的任何提示?

另外,请记住,我正在寻找一个有效的解决方案,即使它不是基于上述代码或 data.table根本!

谢谢!

最佳答案

这是一种方法:

library(data.table)
orderDT = with(df, data.table(id = user_id, completed = !cancelled_order, d = order_date))

vec = list(minus = 14L, plus = 14L)
orderDT[, c("dplus", "dminus") := .(
orderDT[!(completed)][orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)], on=.(id, d <= d_plus, d >= d_tom), .N, by=.EACHI]$N
,
orderDT[!(completed)][orderDT[, .(id, d_minus = d - vec$minus, d_yest = d - 1L)], on=.(id, d >= d_minus, d <= d_yest), .N, by=.EACHI]$N
)]


id completed d dplus dminus
1: 1 TRUE 2015-01-28 2 0
2: 1 FALSE 2015-01-31 1 0
3: 1 FALSE 2015-02-08 0 1
4: 1 TRUE 2015-02-23 0 0
5: 1 TRUE 2015-03-23 0 0
6: 2 TRUE 2015-01-25 2 0
7: 2 FALSE 2015-01-28 1 0
8: 2 FALSE 2015-02-06 0 1
9: 2 TRUE 2015-02-21 0 0
10: 2 TRUE 2015-03-26 0 0

(我发现 OP 的列名很麻烦,因此将它们缩短了。)

这个怎么运作

每列都可以单独运行,例如
orderDT[!(completed)][orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)], on=.(id, d <= d_plus, d >= d_tom), .N, by=.EACHI]$N

这可以通过简化分解为步骤:
orderDT[!(completed)][
orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)],
on=.(id, d <= d_plus, d >= d_tom),
.N,
by=.EACHI]$N
# original version

orderDT[!(completed)][
orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)],
on=.(id, d <= d_plus, d >= d_tom),
.N,
by=.EACHI]
# don't extract the N column of counts

orderDT[!(completed)][
orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)],
on=.(id, d <= d_plus, d >= d_tom)]
# don't create the N column of counts

orderDT[!(completed)]
# don't do the join

orderDT[, .(id, d_plus = d + vec$plus, d_tom = d + 1L)]
# see the second table used in the join

这使用“非对等”连接,采用不等式来定义日期范围。有关更多详细信息,请参阅通过键入 ?data.table 找到的文档页面。 .

关于r - 使用分组计算过去和 future 特定事件的发生次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41615967/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com