gpt4 book ai didi

r - 计算数据框中分类变量的出现次数 (R)

转载 作者:行者123 更新时间:2023-12-05 01:02:06 25 4
gpt4 key购买 nike

我有数据框:

station     date        classification
1 June - 01/16 A
2 June - 03/16 B
1 June - 01/16 A
7 June - 01/16 C
1 June - 03/16 A
2 June - 03/16 B
2 June - 03/16 B

我想得到A、B、C的总出现次数,按站号和日期汇总:

例如,6 月 1 日的第 1 站有 2 个 As,而 6 月 3 日的第 2 站有 3 个 B。

我试过了,

aggregate(x = list(data_frame$classification), by = list(station=data_frame$station, Date=data_frame$date), function(x) length(unique(x))

最佳答案

如果我们需要'A','B'和'C'的计数,那么reshape可能会更好。我们将 'data.frame' 转换为 'data.table' (setDT(data_frame)) 并使用 data.table 中的 dcast 来 reshape 从 'long' 到 'wide' 格式,将 fun.aggregate 指定为 length

library(data.table)
dcast(setDT(data_frame), station+date~classification, length)
# station date A B C
#1: 1 June - 01/16 2 0 0
#2: 1 June - 03/16 1 0 0
#3: 2 June - 03/16 0 3 0
#4: 7 June - 01/16 0 0 1

dplyr 选项是

library(dplyr)
data_frame %>%
group_by(station, date, classification) %>%
tally()
# station date classification n
# (int) (chr) (chr) (int)
#1 1 June - 01/16 A 2
#2 1 June - 03/16 A 1
#3 2 June - 03/16 B 3
#4 7 June - 01/16 C 1

数据

data_frame <- structure(list(station = c(1L, 2L, 1L, 7L, 1L, 2L, 2L), 
date = c("June - 01/16",
"June - 03/16", "June - 01/16", "June - 01/16", "June - 03/16",
"June - 03/16", "June - 03/16"), classification = c("A", "B",
"A", "C", "A", "B", "B")), .Names = c("station", "date", "classification"
), class = "data.frame", row.names = c(NA, -7L))

关于r - 计算数据框中分类变量的出现次数 (R),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36700028/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com