gpt4 book ai didi

R - 计算数据框中有多少行具有相同的值并且日期在 x 天内

转载 作者:行者123 更新时间:2023-12-04 22:31:21 25 4
gpt4 key购买 nike

我有一个数据框“DFrame”,如下所示:

RecordNo |  Cust_ID  |  Record_Date
1 | 023 | 2014-03-01
2 | 056 | 2014-01-18
3 | 041 | 2014-03-04
4 | 023 | 2014-03-21
5 | 056 | 2014-01-25
6 | 003 | 2014-03-01
7 | 023 | 2014-04-01
8 | 023 | 2014-04-02

我想添加一列,显示在当前 record_date 的接下来 14 天内同一客户 ID 的记录出现的次数。
RecordNo |  Cust_ID  |  Record_Date  | 14-day_Repeat_Count
1 | 023 | 2014-03-01 | 0
2 | 056 | 2014-01-18 | 1
3 | 041 | 2014-03-04 | 0
4 | 023 | 2014-03-21 | 2
5 | 056 | 2014-01-25 | 0
6 | 003 | 2014-03-01 | 0
7 | 023 | 2014-04-01 | 1
8 | 023 | 2014-04-02 | 0

我正在尝试用 R 编写快速代码来实现这一点。我找了几篇文章,让满足条件的记录计数看起来很容易,但它们通常只指向静态条件或与当前记录的值无关的条件:
http://one-line-it.blogspot.ca/2013/01/r-number-of-rows-matching-condition.html

我想逻辑可能如下所示:
# Sort DFRAME by RECORD_DATE decreasing=FALSE
......
# Count records below current record where that have matching Cust_ID
# AND the Record_Date is <= 14 days of the current Record_Date

# put result into DFrame$14-day_Repeat_Count
......

我在 DAX 中完成了这种类型的逻辑:
=calculate(counta([Cust_ID],
filter(DFrame,
[Cust_ID]=Earlier([Cust_ID]) &&
[Record_Date] > Earlier([Record_Date]) &&
[Record_Date] <= (Earlier([Record_Date]) + 14)
)
)

(非常快,但 Microsoft 专有),并且在 Excel 中使用“CountIfs”(易于实现,非常非常慢,并且再次需要与 Microsoft 联姻),任何人都可以向我指出一些关于如何根据 R 中的标准进行计数的阅读 Material ?

最佳答案

也许更快、更高效的答案可能如下所示:

##Combine into one data.table
library("data.table")
RecordNo <- 1:36
Record_Date <- c(31,33,38,41,44,59,68,69,75,78,85,88,
32,34,45,46,51,54,60,65,67,70,74,80,
33,35,42,45,50,60,65,70,75,80,82,85)
Cust_ID <- c(rep(1,12),rep(2,12),rep(3,12))
data <- data.table(Cust_ID,Record_Date)[order(Cust_ID,Record_Date)]

##Assign each customer an number that ranks them
data[,Cust_No:=.GRP,by=c("Cust_ID")]

##Create "list" of comparison dates for each customer
Ref <- data[,list(Compare_Date=list(I(Record_Date))), by=c("Cust_ID")]

##Compare two lists and see of the compare date is within N days
system.time(
data$Roll.Cnt <- mapply(FUN = function(RD, NUM) {
d <- as.numeric(Ref$Compare_Date[[NUM]] - RD)
sum((d > 0 & d <= 14))
}, RD = data$Record_Date,NUM=data$Cust_No)
)

结果数据如下所示:
data <- data[,list(Cust_ID,Record_Date,Roll.Cnt)][order(Cust_ID,Record_Date)]
data

Cust_ID Record_Date Roll.Cnt
1: 1 31 4
2: 1 33 3
3: 1 38 2
4: 1 41 1
5: 1 44 0
6: 1 59 2
7: 1 68 3
8: 1 69 2
9: 1 75 3
10: 1 78 2
11: 1 85 1
12: 1 88 0
13: 2 32 3
14: 2 34 2
15: 2 45 3

关于R - 计算数据框中有多少行具有相同的值并且日期在 x 天内,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23093532/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com