gpt4 book ai didi

R:计算自上次出现值以来的累计和和计数

转载 作者:行者123 更新时间:2023-12-01 12:24:51 25 4
gpt4 key购买 nike

给定简化数据

set.seed(13)

user_id = rep(1:2, each = 10)
order_id = sample(1:20, replace = FALSE)
cost = round(runif(20, 1.5, 75),1)
category = sample( c("apples", "pears", "chicken"), 20, replace = TRUE)
pit = rep(c(0,0,0,0,1), 4)

df = data.frame(cbind(user_id, order_id, cost, category, pit))

user_id order_id cost category pit
1 15 11.6 pears 0
1 5 41.7 apples 0
1 8 51.3 chicken 0
1 2 40.3 pears 0
1 16 7.9 pears 1
1 1 47.1 chicken 0
1 9 3.8 apples 0
1 10 35.4 apples 0
1 11 25.8 chicken 0
1 20 48.1 chicken 1
2 7 32.6 pears 0
2 18 31.3 pears 0
2 14 69 apples 0
2 4 60.9 chicken 0
2 13 41.2 apples 1
2 17 9.4 pears 0
2 19 34.9 apples 0
2 6 5.3 pears 0
2 3 57.3 apples 0
2 12 7.7 apples 1

我想创建包含成本累计总和和不同类别计数的列自上次 pit == 1 以来。所以结果看起来像这样:

user_id order_id cost category pit cum_cost distinct_categories
1 15 11.6 pears 0 11.6 1
1 5 41.7 apples 0 53.3 2
1 8 51.3 chicken 0 104.6 3
1 2 40.3 pears 0 144.9 3
1 16 7.9 pears 1 152.8 3
1 1 47.1 chicken 0 47.1 1
1 9 3.8 apples 0 50.9 2
1 10 35.4 apples 0 86.3 2
1 11 25.8 chicken 0 112.1 3
1 20 48.1 chicken 1 160.2 3
2 7 32.6 pears 0 32.6 1
2 18 31.3 pears 0 63.9 1
2 14 69 apples 0 132.9 2
2 4 60.9 chicken 0 193.8 3
2 13 41.2 apples 1 235.0 3
2 17 9.4 pears 0 9.4 1
2 19 34.9 apples 0 44.3 2
2 6 5.3 pears 0 49.6 2
2 3 57.3 apples 0 106.9 2
2 12 7.7 apples 1 114.6 2

理想情况下,解决方案将在 dplyr 中,但我对其他包/方法持开放态度。非常感谢您的帮助!卡西娅

最佳答案

我们可以使用dplyr。按 'user_id' 和通过获取 'pit' 的累积和并获取其 lag 创建的分组变量进行分组,我们得到 'cost' 的 cumsum 作为 'cum_cost'和 match 索引的 cummax 在 'category' 和 unique 'category' 之间作为 'distinct_categories。

library(dplyr)
df %>%
group_by(user_id, ind= lag(cumsum(pit), default=0)) %>%
mutate(cum_cost = cumsum(cost),
distinct_categories = cummax(match(category, unique(category))))
# user_id order_id cost category pit ind cum_cost distinct_categories
# <int> <int> <dbl> <chr> <int> <dbl> <dbl> <int>
#1 1 3 49.8 apples 0 0 49.8 1
#2 1 13 14.8 chicken 0 0 64.6 2
#3 1 18 11.4 apples 0 0 76.0 2
#4 1 15 52.6 chicken 0 0 128.6 2
#5 1 11 13.6 chicken 1 0 142.2 2
#6 1 19 26.9 chicken 0 1 26.9 1
#7 1 2 54.9 chicken 0 1 81.8 1
#8 1 1 70.6 chicken 0 1 152.4 1
#9 1 10 55.0 chicken 0 1 207.4 1
#10 1 12 19.7 chicken 1 1 227.1 1
#11 2 8 40.0 pears 0 2 40.0 1
#12 2 16 37.4 pears 0 2 77.4 1
#13 2 20 70.5 pears 0 2 147.9 1
#14 2 5 63.8 apples 0 2 211.7 2
#15 2 14 31.9 apples 1 2 243.6 2
#16 2 17 9.1 chicken 0 3 9.1 1
#17 2 4 21.9 pears 0 3 31.0 2
#18 2 7 52.3 apples 0 3 83.3 3
#19 2 9 43.3 chicken 0 3 126.6 3
#20 2 6 9.9 pears 1 3 136.5 3

关于R:计算自上次出现值以来的累计和和计数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41020670/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com