gpt4 book ai didi

R data.table 计数面板数据

转载 作者:行者123 更新时间:2023-12-03 23:42:59 24 4
gpt4 key购买 nike

我有面板数据(主题/年),我只想保留每年出现次数最多的主题。数据集很大,所以我使用了 data.table 包。有没有比我在下面尝试的更优雅的解决方案?

library(data.table)

DT <- data.table(SUBJECT=c(rep('John',3), rep('Paul',2),
rep('George',3), rep('Ringo',2),
rep('John',2), rep('Paul',4),
rep('George',2), rep('Ringo',4)),
YEAR=c(rep(2011,10), rep(2012,12)),
HEIGHT=rnorm(22),
WEIGHT=rnorm(22))
DT

DT[, COUNT := .N, by='SUBJECT,YEAR']
DT[, MAXCOUNT := max(COUNT), by='YEAR']

DT <- DT[COUNT==MAXCOUNT]
DT <- DT[, c('COUNT','MAXCOUNT') := NULL]
DT

最佳答案

我不确定你会认为这很优雅,但怎么样:

DT[, COUNT := .N, by='SUBJECT,YEAR']
DT[, .SD[COUNT == max(COUNT)], by='YEAR']

这基本上就是如何申请 byi @SenorO 评论的表达式。您仍然需要 [,COUNT:=NULL]之后,但对于一个临时列而不是两个。

我们不鼓励 .SD虽然出于速度原因,但希望我们能尽快处理此功能请求,以便删除建议: FR#2330 Optimize .SD[i] query to keep the elegance but make it faster unchanged. .

一种不同的方法如下。它更快,更惯用,但可能被认为不那么优雅。
# Create a small aggregate table first. No need to use := on the big table.
i = DT[, .N, by='SUBJECT,YEAR']

# Find the even smaller subset. (Do as much as we can on the small aggregate.)
i = i[, .SD[N==max(N)], by=YEAR]

# Finally join the small subset of key values to the big table
setkey(DT, YEAR, SUBJECT)
DT[i]

类似的东西是 here .

关于R data.table 计数面板数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18645097/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com