gpt4 book ai didi

r - 带有滚动连接的 data.table 计算平均日期条件

转载 作者:行者123 更新时间:2023-12-04 11:41:58 26 4
gpt4 key购买 nike

我可以通过多次循环我的数据集来做到这一点,但我认为必须有一种更有效的方法来通过 data.table 做到这一点。
这是数据集的样子:

CaseID         Won     OwnerID      Time_period    Finished
1 yes A 1 no
1 yes A 3 no
1 yes A 5 yes
2 no A 4 no
2 no A 6 yes
3 yes A 2 yes
4 no A 3 yes
5 15 B 2 no

对于每一行,按所有者,我想生成在该时间段之前完成的案件数量的平均值。
CaseID         Won     OwnerID      Time_period     Finished     AvgWonByOwner  
1 yes A 1 no NA
1 yes A 3 no 1
1 yes A 5 yes .5
2 no A 4 no .5
2 no A 6 yes 2/3
3 yes A 2 yes NA
4 no A 3 yes 1
5 15 B 2 no NA

仔细一看,这似乎复杂得可笑。我认为你可以通过某种滚动合并来做到这一点,但我不知道如何设置一个条件,即在行日期之前仅从 Won 计算平均值,并且它必须具有相同的 ownerID .

编辑 1:对最后一列中数字的解释
AvgWonByOwner          Explanation
NA t = 1, No cases finished yet, this could be 0 too
1 t = 3, case 3 finished and is won, so average wins is 1
.5 t = 5, case 3 finished, won; case 4 finished lost; average = .5
.5 t = 4, case 3 finished, won; case 4 finished lost; average = .5
2/3 t = 6, case 3 finished, won, case 4 finished lost, case 1 finished won, average: 2/3
NA t = 1, No cases finished yet, this could be 0 too
1 t = 3, case 3 finished and is won, so average wins is 1
NA t = 1, No cases finished yet, this could be 0 too

最佳答案

dt = data.table(structure(list(CaseID = c(1, 1, 1, 2, 2, 3, 4, 5), Won = structure(c(3L, 
3L, 3L, 2L, 2L, 3L, 2L, 1L), .Label = c("15", "no", "yes"), class = "factor"),
OwnerID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c("A",
"B"), class = "factor"), Time_period = c(1L, 3L, 5L, 4L,
6L, 2L, 3L, 2L), Finished = structure(c(1L, 1L, 2L, 1L, 2L,
2L, 2L, 1L), .Label = c("no", "yes"), class = "factor")), .Names = c("CaseID",
"Won", "OwnerID", "Time_period", "Finished"), row.names = c(NA,
-8L), class = c("data.table", "data.frame")))

# order
setkey(dt, OwnerID, Time_period)

# calculate the required ratio but including current time
dt[, ratio := cumsum(Finished == "yes" & Won == "yes") /
cumsum(Finished == "yes"),
by = list(OwnerID)]

# shift to satisfy the strict inequality as per OP
dt[, avgWon := c(NaN, ratio[-.N]), by = OwnerID]

# take the first one for each time (that is last one from previous time)
# so that all of the outcomes happening at same time are accounted for
dt[, avgWon := avgWon[1], by = key(dt)]

dt[order(OwnerID, CaseID)]
# CaseID Won OwnerID Time_period Finished ratio avgWon
#1: 1 yes A 1 no NaN NaN
#2: 1 yes A 3 no 1.0000000 1.0000000
#3: 1 yes A 5 yes 0.6666667 0.5000000
#4: 2 no A 4 no 0.5000000 0.5000000
#5: 2 no A 6 yes 0.5000000 0.6666667
#6: 3 yes A 2 yes 1.0000000 NaN
#7: 4 no A 3 yes 0.5000000 1.0000000
#8: 5 15 B 2 no NaN NaN

关于r - 带有滚动连接的 data.table 计算平均日期条件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19940355/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com