gpt4 book ai didi

r - dplyr,总结分类变量

转载 作者:行者123 更新时间:2023-12-04 12:11:41 25 4
gpt4 key购买 nike

我要汇总我的数据 small对于每个不同的 video.id 使用 dplyr .

small %>% 
group_by(Video.ID) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.),
cat = mean(Category))

mean(Category) 显然是错误的方法。我如何获得它只是使用重复多次的值(一个 video.id 始终具有相同的类别,无论它在数据框中出现的频率如何)。

我的数据框看起来像这样:
small

# A tibble: 6 x 7
X1 X1_1 Video.ID Video.Duration..sec. Category Owned.Views Partner.Revenue
<int> <int> <chr> <int> <chr> <int> <dbl>
1 1 1 ---0zh9uzSE 1184 gadgets 6 0
2 2 2 ---0zh9uzSE 1184 gadgets 6 0
3 3 3 ---0zh9uzSE 1184 gadgets 2 0
4 4 4 ---0zh9uzSE 1184 gadgets 1 0
5 5 5 ---0zh9uzSE 1184 gadgets 1 0
6 6 6 ---0zh9uzSE 1184 gadgets 3 0

small <-
structure(list(X1 = 1:6,
X1_1 = 1:6,
Video.ID = c("---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE"),
Video.Duration..sec. = c(1184L, 1184L, 1184L, 1184L, 1184L, 1184L),
Category = c("gadgets", "gadgets", "gadgets", "gadgets", "gadgets", "gadgets"),
Owned.Views = c(6L, 6L, 2L, 1L, 1L, 3L),
Partner.Revenue = c(0, 0, 0, 0, 0, 0)),
row.names = c(NA, -6L),
class = c("tbl_df", "tbl", "data.frame"))

最佳答案

您至少有两个选择来解决这个问题:

将类别列添加到您的 group_by :

small %>% 
group_by(Video.ID, cat = Category) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.))

# A tibble: 1 x 4
# Groups: Video.ID [?]
# Video.ID cat sumr len
# <chr> <chr> <dbl> <dbl>
# 1 ---0zh9uzSE gadgets 0 1184

或使用 unique(Catregory) :
small %>% 
group_by(Video.ID) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.),
cat = unique(Category))

# A tibble: 1 x 4
# Video.ID sumr len cat
# <chr> <dbl> <dbl> <chr>
# 1 ---0zh9uzSE 0 1184 gadgets

第一个选项可能是首选,因为如果每个 id 有多个类别,它仍然有效。

关于r - dplyr,总结分类变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50311029/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com