gpt4 book ai didi

r - 按月汇总多维数组

转载 作者:行者123 更新时间:2023-12-05 09:27:30 24 4
gpt4 key购买 nike

我有一个多维数组,其中第 3 维表示时间。出于这个问题的目的,让我们使用 plyr 包中的 ozone 数据集:

> str(ozone)
num [1:24, 1:24, 1:72] 260 258 258 254 252 252 250 248 248 248 ...
- attr(*, "dimnames")=List of 3
..$ lat : chr [1:24] "-21.2" "-18.7" "-16.2" "-13.7" ...
..$ long: chr [1:24] "-113.8" "-111.3" "-108.8" "-106.3" ...
..$ time: chr [1:72] "1" "2" "3" "4" ...

来自文档:

The data are monthly ozone averages on a very coarse 24 by 24 grid covering Central America, from Jan 1995 to Dec 2000. The data is stored in a 3d area with the first two dimensions representing latitude and longitude, and the third representing time.

我想做的是为每个纬度/经度单元格创建月平均值。我可以像这样使用 tapply 为单个纬度/经度组合执行此操作:

> tapply(ozone[1,1,], rep(1:12, 6), mean)
1 2 3 4 5 6 7 8 9 10 11 12
264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333 277.0000 285.0000 283.0000 273.3333

但我很难一次对整个数组执行此操作。 apply 将让我选择要操作的维度 (MARGIN),tapply 将让我使用一个因子来选择切片 (INDEX ),但我两者都需要。

我乐于接受建议,但由于数据的大小和复杂性,我更喜欢使用数组而不是数据框。


下面来自 GKiG 的两个优秀答案。 Grothendieck,非常感谢他们。我已将它们置于 microbenchmark 中,结果如下:

> microbenchmark(
GKi1 = apply(array(ozone, c(dim(ozone)[1:2], 12, dim(ozone)[3]/12), c(dimnames(ozone)[1:2], list(month=1:12, year=1995:2000))), 1:3, mean),
GKi2 = simplify2array(lapply(split(dimnames(plyr::ozone)[[3]], 1:12), \(x) apply(plyr::ozone[,,x], 1:2, mean))),
Grothendieck = apply(ozone, 1:2, month_mean),
times=100)

Unit: milliseconds
expr min lq mean median uq max neval
GKi1 21.67954 22.99889 25.73446 23.89190 27.26137 46.26843 100
GKi2 20.90931 22.90361 26.64572 23.88572 30.76128 45.16404 100
Grothendieck 40.98800 43.26854 49.51313 44.73214 52.28114 266.12759 100

最佳答案

您可以将时间维度划分为月份和年份,然后使用 apply

x <- plyr::ozone
x <- array(x, c(dim(x)[1:2], 12, dim(x)[3]/12),
c(dimnames(x)[1:2], list(month=1:12, year=1995:2000)))
#dim(x) <- c(dim(x)[1:2], 12, dim(x)[3]/12) #Alternative without names
. <- apply(x, 1:3, mean)
.[1,1,]
# 1 2 3 4 5 6 7 8
#264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333
# 9 10 11 12
#277.0000 285.0000 283.0000 273.3333

可以是另一种选择。

. <- simplify2array(lapply(split(dimnames(plyr::ozone)[[3]], 1:12), \(x)
apply(plyr::ozone[,,x], 1:2, mean)))
.[1,1,]
# 1 2 3 4 5 6 7 8
#264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333
# 9 10 11 12
#277.0000 285.0000 283.0000 273.3333

或者在 apply 中使用 tapply (基于@g-grothendieck 的回答)

. <- apply(plyr::ozone, 1:2, tapply, rep(1:12, 6), mean)
.[,1,1]
#aperm(., c(2,3,1))[1,1,] #Alternative
# 1 2 3 4 5 6 7 8
#264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333
# 9 10 11 12
#277.0000 285.0000 283.0000 273.3333

apply 中使用 by

. <- apply(plyr::ozone, 1:2, by, rep(1:12, 6), mean)
.[,1,1]
# 1 2 3 4 5 6 7 8
#264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333
# 9 10 11 12
#277.0000 285.0000 283.0000 273.3333

如果速度很重要,可以使用 rowMeans

. <- rowMeans(`dim<-`(plyr::ozone, c(dim(plyr::ozone)[1:2], 12,
dim(plyr::ozone)[3]/12)), dims=3)
.[1,1,]
# [1] 264.6667 257.6667 255.0000 251.3333 257.6667 265.0000 274.0000 275.3333
# [9] 277.0000 285.0000 283.0000 273.3333

基准:

set.seed(42)
a <- array(runif(24 * 24 * 72), c(24, 24, 72))

bench::mark(check = FALSE, #Some have attr and have not the same order
applyDim = apply(`dim<-`(a, c(dim(a)[1:2], 12, dim(a)[3]/12)), 1:3, mean),
split = simplify2array(lapply(split(seq_len(dim(a)[3]), 1:12), \(x)
apply(a[,,x], 1:2, mean))),
applyTapply = apply(a, 1:2, tapply, rep_len(1:12, dim(a)[3]), mean),
applyBy = apply(a, 1:2, by, rep_len(1:12, dim(a)[3]), mean),
rowMeans = rowMeans(`dim<-`(a, c(dim(a)[1:2], 12, dim(a)[3]/12)), dims=3)
)
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
#1 applyDim 25.5ms 29.2ms 33.4 864.33KB 41.2 17 21
#2 split 30.6ms 33.6ms 30.1 984.13KB 35.7 16 19
#3 applyTapply 36.2ms 38.8ms 22.6 2.52MB 35.9 12 19
#4 applyBy 177.1ms 179.1ms 5.56 2.71MB 35.2 3 19
#5 rowMeans 130.9µs 155.1µs 5149. 378.09KB 38.0 2575 19

关于r - 按月汇总多维数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72348326/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com