gpt4 book ai didi

R:如何对数据框中定义的不同时间段内的数据进行平均?

转载 作者:行者123 更新时间:2023-12-02 05:40:03 25 4
gpt4 key购买 nike

假设我有一个数据框“数据”,其中包含测量变量 x 的时间序列数据:

     date            x
2009/10/01 00:00 10
2009/10/01 01:00 11
2009/10/01 02:00 12
2009/10/01 03:00 13
2009/10/01 04:00 14
2009/10/01 05:00 15
2009/10/01 06:00 16
2009/10/01 07:00 17
2009/10/01 08:00 18
2009/10/01 09:00 19
2009/10/01 10:00 20
2009/10/01 11:00 21
2009/10/01 12:00 22
2009/10/01 13:00 23
2009/10/01 14:00 24
2009/10/01 15:00 25
2009/10/01 16:00 26
2009/10/01 17:00 27
2009/10/01 18:00 28
2009/10/01 19:00 29
2009/10/01 20:00 30
2009/10/01 21:00 31
2009/10/01 22:00 32
2009/10/01 23:00 33
2009/10/02 00:00 34
...

和另一个数据框“事件”,具有由开始和结束日期定义的不同时间段:

id        start              stop
1 2009/10/01 02:00 2009/10/01 04:00
2 2009/10/01 07:00 2009/10/01 10:00
3 2009/10/01 08:00 2009/10/01 20:00
...

现在我想得到一个 x 在不同事件中的平均值的表格,如下所示:

id  mean.x
1 13
2 18.5
3 25.5

在数据库中,我执行如下简单的 SQL 语句:

SELECT a.id, avg(b.x) 
FROM events as a, data as b
WHERE b.date between a.start and a.stop
GROUP BY a.id

我想知道如何在 R 中进行这种平均?如果我在数据中有一个 id 列指示哪个数据点属于哪个事件,我可以使用“聚合”,但我找不到创建此列的方法...

如有任何建议,我们将不胜感激。

编辑:

输入(数据):

structure(list(date = structure(c(1254348000, 1254351600, 1254355200, 
1254358800, 1254362400, 1254366000, 1254369600, 1254373200, 1254376800,
1254380400, 1254384000, 1254387600, 1254391200, 1254394800, 1254398400,
1254402000, 1254405600, 1254409200, 1254412800, 1254416400, 1254420000,
1254423600, 1254427200, 1254430800, 1254434400), class = c("POSIXct",
"POSIXt"), tzone = "Europe/Berlin"), x = 10:34), .Names = c("date",
"x"), row.names = c(NA, -25L), class = "data.frame")

输入(事件):

structure(list(id = 1:3, start = structure(c(1254355200, 1254373200, 
1254387600), class = c("POSIXct", "POSIXt"), tzone = "Europe/Berlin"),
stop = structure(c(1254362400, 1254384000, 1254420000), class = c("POSIXct",
"POSIXt"), tzone = "Europe/Berlin")), .Names = c("id", "start",
"stop"), row.names = c(NA, -3L), class = "data.frame")

编辑2:

输入(事件2):

structure(list(id = structure(1:3, .Label = c("AGH", "TRG", "ZUH"
), class = "factor"), start = structure(c(1254355200, 1254358800,
1254358800), class = c("POSIXct", "POSIXt"), tzone = "Europe/Berlin"),
stop = structure(c(1254362400, 1254384000, 1254420000), class = c("POSIXct",
"POSIXt"), tzone = "Europe/Berlin")), .Names = c("id", "start",
"stop"), row.names = c(NA, -3L), class = "data.frame")

最佳答案

试试这个:

library(sqldf)
sqldf("
SELECT a.id, avg(b.x)
FROM events as a, data as b
WHERE b.date between a.start and a.stop
GROUP BY a.id
")

关于R:如何对数据框中定义的不同时间段内的数据进行平均?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11123008/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com