r - 如何确保 data.table 使用 GForce-6ren

r - 如何确保 data.table 使用 GForce

转载作者：行者123 更新时间：2023-12-03 21:33:15

我正在使用 data.table 运行以下代码，我想更好地了解触发 GForce 的条件是什么

DT = data.table(date = rep(seq(Sys.Date(), by = "-1 day", length.out = 1000), 10),
                x    = runif(10000),
                id   = rep(1:10, each = 1000))

对于下面的情况，我可以看到它的工作原理:

DT[, .(max(x), min(x), mean(x)), by = id, verbose = T]

Detected that j uses these columns: x 
Finding groups using forderv ... 0 sec
Finding group sizes from the positions (can be avoided to save RAM) ... 0 sec
lapply optimization is on, j unchanged as 'list(max(x), min(x), mean(x))'
GForce optimized j to 'list(gmax(x), gmin(x), gmean(x))'
Making each group and running j (GForce TRUE) ... 0 secs

但对于我的用例来说不是

window1 <- Sys.Date() - 50
window2 <- Sys.Date() - 150
window3 <- Sys.Date() - 550

DT[, .(max(x[date > Sys.Date() - 50]), max(x[date > Sys.Date() - 150]), 
       max(x[date > Sys.Date() - 550])), by = id, verbose = T]

Detected that j uses these columns: x,date 
Finding groups using forderv ... 0 sec
Finding group sizes from the positions (can be avoided to save RAM) ... 0 sec
lapply optimization is on, j unchanged as 'list(max(x[date > Sys.Date() - 50]), max(x[date > Sys.Date() - 150]), max(x[date > Sys.Date() - 550]))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ...
  memcpy contiguous groups took 0.000s for 10 groups
  eval(j) took 0.005s for 10 calls
0.005 secs

我唯一想到的是 max 函数中的每个向量都有不同的长度。

最佳答案

我会做一个非平等的加入:

# convert to IDate for speed
DT[, date := as.IDate(date)]

mDT = CJ(id = unique(DT$id), days_ago = c(50L, 150L, 550L))
mDT[, date_dn := as.IDate(Sys.Date()) - days_ago]

res = DT[mDT, on=.(id, date > date_dn), .(
  days_ago = first(days_ago), 
  m = mean(x)
), by=.EACHI, verbose=TRUE]

这打印出来...

Non-equi join operators detected ... 
  forder took ... 0 secs
  Generating group lengths ... done in 0 secs
  Generating non-equi group ids ... done in 0.01 secs
  Found 1 non-equi group(s) ...
Starting bmerge ...done in 0 secs
Detected that j uses these columns: days_ago,x 
lapply optimization is on, j unchanged as 'list(first(days_ago), mean(x))'
Old mean optimization changed j from 'list(first(days_ago), mean(x))' to 'list(first(days_ago), .External(Cfastmean, x, FALSE))'
Making each group and running j (GForce FALSE) ... 
  collecting discontiguous groups took 0.000s for 30 groups
  eval(j) took 0.000s for 30 calls
0 secs

所以出于某种原因，这使用了另一种形式的优化而不是 GForce。

结果看起来像...

    id       date days_ago         m
 1:  1 2017-12-19       50 0.4435722
 2:  1 2017-09-10      150 0.4842963
 3:  1 2016-08-06      550 0.4775890
 4:  2 2017-12-19       50 0.4838715
 5:  2 2017-09-10      150 0.5150688
 6:  2 2016-08-06      550 0.5141174
 7:  3 2017-12-19       50 0.4804182
 8:  3 2017-09-10      150 0.4910027
 9:  3 2016-08-06      550 0.4901343
10:  4 2017-12-19       50 0.4644922
11:  4 2017-09-10      150 0.4902132
12:  4 2016-08-06      550 0.4810129
13:  5 2017-12-19       50 0.4666715
14:  5 2017-09-10      150 0.5193629
15:  5 2016-08-06      550 0.4850173
16:  6 2017-12-19       50 0.5318109
17:  6 2017-09-10      150 0.5481641
18:  6 2016-08-06      550 0.5216787
19:  7 2017-12-19       50 0.4500243
20:  7 2017-09-10      150 0.4915983
21:  7 2016-08-06      550 0.5055563
22:  8 2017-12-19       50 0.4958809
23:  8 2017-09-10      150 0.4915432
24:  8 2016-08-06      550 0.4981277
25:  9 2017-12-19       50 0.5833083
26:  9 2017-09-10      150 0.5160464
27:  9 2016-08-06      550 0.5091702
28: 10 2017-12-19       50 0.4946466
29: 10 2017-09-10      150 0.4798743
30: 10 2016-08-06      550 0.5030687
    id       date days_ago         m

据我所知，只有当函数的参数(这里是 mean)是一个像 x 这样的简单列时，才会进行这种优化。 , 而不是像 x[date > Sys.Date() - 50] 这样的表达式.

关于r - 如何确保 data.table 使用 GForce，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48662579/

文章推荐： php - 在 HTTPS 期间访问 Guzzle 代理 header

r - 如何确保 data.table 使用 GForce
我正在使用 data.table 运行以下代码，我想更好地了解触发 GForce 的条件是什么 DT = data.table(date = rep(seq(Sys.Date(), by = "-1
r - 关于 GForce 在 data.table 1.9.2
我不知道如何在data.table 1.9.2中充分利用GForce New optimization: GForce. Rather than grouping the data, the grou
r - 在 R 中使用 data.table 包对列求和 - 获取 GForce sum(gsum) 错误
这是一个数据表: Date colA colB colC .... month year 01/23/15 2323 2323 2323 january 201
r - 在 R 中使用 data.table 包对列求和 - 获取 GForce sum(gsum) 错误
这是一个数据表: Date colA colB colC .... month year 01/23/15 2323 2323 2323 january 201

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - 如何确保 data.table 使用 GForce