gpt4 book ai didi

r - 如何提取每组的前 n 行?

转载 作者:行者123 更新时间:2023-12-03 10:12:23 25 4
gpt4 key购买 nike

我有一个数据表 dt .此数据表首先按列排序 date (我的分组变量),然后按列 age :

library(data.table)
setkeyv(dt, c("date", "age")) # Sorts table first by column "date" then by "age"
> dt
date age name
1: 2000-01-01 3 Andrew
2: 2000-01-01 4 Ben
3: 2000-01-01 5 Charlie
4: 2000-01-02 6 Adam
5: 2000-01-02 7 Bob
6: 2000-01-02 8 Campbell

我的问题是:我想知道是否可以为每个唯一日期提取前 2 行?或者更笼统地说:

如何提取每组中的前 n 行?

在这个例子中,结果在 dt.f将是:
> dt.f = ???????? # function of dt to extract the first 2 rows per unique date
> dt.f
date age name
1: 2000-01-01 3 Andrew
2: 2000-01-01 4 Ben
3: 2000-01-02 6 Adam
4: 2000-01-02 7 Bob

附言这是创建上述 data.table 的代码:
install.packages("data.table")
library(data.table)
date <- c("2000-01-01","2000-01-01","2000-01-01",
"2000-01-02","2000-01-02","2000-01-02")
age <- c(3,4,5,6,7,8)
name <- c("Andrew","Ben","Charlie","Adam","Bob","Campbell")
dt <- data.table(date, age, name)
setkeyv(dt,c("date","age")) # Sorts table first by column "date" then by "age"

最佳答案

是的,只需使用 .SD并根据需要对其进行索引。

  DT[, .SD[1:2], by=date]

date age name
1: 2000-01-01 3 Andrew
2: 2000-01-01 4 Ben
3: 2000-01-02 6 Adam
4: 2000-01-02 7 Bob

根据@eddi 的建议进行编辑。

@eddi 的建议是:

使用它来提高速度:
  DT[DT[, .I[1:2], by = date]$V1]

# using a slightly larger data set
> microbenchmark(SDstyle=DT[, .SD[1:2], by=date], IStyle=DT[DT[, .I[1:2], by = date]$V1], times=200L)
Unit: milliseconds
expr min lq median uq max neval
SDstyle 13.567070 16.224797 22.170302 24.239881 88.26719 200
IStyle 1.675185 2.018773 2.168818 2.269292 11.31072 200

关于r - 如何提取每组的前 n 行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16325641/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com