gpt4 book ai didi

r - 算法效率——时差循环

转载 作者:塔克拉玛干 更新时间:2023-11-03 06:28:31 26 4
gpt4 key购买 nike

我有一个名为 vistsPerDay 的数据集,它看起来像这样,但有 405,890 行和 10,406 个唯一的 CUST_ID:

> CUST_ID   Date
> 1 2013-09-19
> 1 2013-10-03
> 1 2013-10-08
> 1 2013-10-12
> 1 2013-10-20
> 1 2013-10-25
> 1 2013-11-01
> 1 2013-11-02
> 1 2013-11-08
> 1 2013-11-15
> 1 2013-11-23
> 1 2013-12-02
> 1 2013-12-04
> 1 2013-12-09
> 2 2013-09-16
> 2 2013-09-17
> 2 2013-09-18

我想做的是创建一个新变量,它是访问日期之间的滞后差异。这是我目前使用的代码:

visitsPerDay <- visitsPerDay[order(visitsPerDay$CUST_ID), ]
cust_id <- 0
for (i in 1:nrow(visitsPerDay)) {
if (visitsPerDay$CUST_ID[i] != cust_id) {
cust_id <- visitsPerDay$CUST_ID[i]
visitsPerDay$MTBV <- NA
} else {
visitsPerDay$MBTV <- as.numeric(visitsPerDay$Date[i] - visitsPerDay$Date[i-1])
}
}

我觉得这肯定不是最有效的方法。有没有人有更好的方法来处理它?谢谢!

最佳答案

这是data.table 解决方案。这可能会更快并且更具可读性:

dt = data.table(visitsPerDay)

dt[, MBTV := c(NA, diff(as.Date(Date))), by = CUST_ID]
dt
# CUST_ID Date MBTV
# 1: 1 2013-09-19 NA days
# 2: 1 2013-10-03 14 days
# 3: 1 2013-10-08 5 days
# 4: 1 2013-10-12 4 days
# 5: 1 2013-10-20 8 days
# 6: 1 2013-10-25 5 days
# 7: 1 2013-11-01 7 days
# 8: 1 2013-11-02 1 days
# 9: 1 2013-11-08 6 days
#10: 1 2013-11-15 7 days
#11: 1 2013-11-23 8 days
#12: 1 2013-12-02 9 days
#13: 1 2013-12-04 2 days
#14: 1 2013-12-09 5 days
#15: 2 2013-09-16 NA days
#16: 2 2013-09-17 1 days
#17: 2 2013-09-18 1 days

关于r - 算法效率——时差循环,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21189073/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com