gpt4 book ai didi

r - 计算某个时间戳正在进行的事件数

转载 作者:行者123 更新时间:2023-12-04 20:53:46 24 4
gpt4 key购买 nike

我有一系列时间戳标记某些事件的开始和结束。

library(chron)
start <- structure(c(14246.3805439815, 14246.3902662037, 14246.3909606481,
14246.3992939815, 14246.4013773148, 14246.4034606481, 14246.4062384259,
14246.4069328704, 14246.4069328704, 14246.4097106481, 14246.4097106481,
14246.4104050926, 14246.4117939815, 14246.4117939815, 14246.4117939815,
14246.4145717593, 14246.4152546296, 14246.4152662037, 14246.4152662037,
14246.4159606481), format = structure(c("m/d/y", "h:m:s"), .Names = c("dates",
"times")), origin = structure(c(1, 1, 1970), .Names = c("month",
"day", "year")), class = c("chron", "dates", "times"))

finish <- structure(c(14246.436099537, 14246.4666550926, 14246.4083217593,
14246.4374884259, 14246.4847106481, 14246.4867939815, 14246.4305439815,
14246.4659606481, 14246.4520717593, 14246.9097106481, 14246.4930439815,
14246.4763773148, 14246.4326273148, 14246.4291550926, 14246.4187384259,
14246.9145717593, 14246.4395601852, 14246.4395717593, 14246.4395717593,
14246.4367939815), format = structure(c("m/d/y", "h:m:s"), .Names = c("dates",
"times")), origin = structure(c(1, 1, 1970), .Names = c("month",
"day", "year")), class = c("chron", "dates", "times"))

events <- data.frame(start, finish)
head(event, 5)

start finish
1 (01/02/09 09:07:59) (01/02/09 10:27:59)
2 (01/02/09 09:21:59) (01/02/09 11:11:59)
3 (01/02/09 09:22:59) (01/02/09 09:47:59)
4 (01/02/09 09:34:59) (01/02/09 10:29:59)
5 (01/02/09 09:37:59) (01/02/09 11:37:59)

我现在想计算在特定时间戳有多少事件正在进行。
intervals <- structure(c(14246.3958333333, 14246.40625, 14246.4166666667, 
14246.4270833333, 14246.4375), format = structure(c("m/d/y",
"h:m:s"), .Names = c("dates", "times")), origin = structure(c(1,
1, 1970), .Names = c("month", "day", "year")), class = c("chron",
"dates", "times"))

intervals

[1] (01/02/09 09:30:00) (01/02/09 09:45:00) (01/02/09 10:00:00) (01/02/09 10:15:00) (01/02/09 10:30:00)

所以我想要的输出如下:
            intervals count
1 (01/01/09 09:30:00) 3
2 (01/01/09 09:45:00) 7
3 (01/01/09 10:00:00) 19
4 (01/01/09 10:15:00) 18
5 (01/01/09 10:30:00) 12

虽然以编程方式解决这个问题很简单,但我希望在 210,000 个间隔和超过 120 万个事件中完成这个。我目前的方法涉及利用 data.table包和 &运算符检查每个事件的开始时间和结束时间之间是否存在间隔。
library(data.table)
events <- data.table(events)
data.frame(intervals, count = sapply(1:5, function(i) sum(events[, start <= intervals[i] & intervals[i] <= finish])))

但是考虑到我的数据的大小,这种方法需要很长时间才能运行。关于在 R 中完成此任务的更好替代方法的任何建议?

干杯。

最佳答案

R 中快速执行代码的秘诀是将所有内容保存在向量或数组中,它们实际上只是伪装的数组。

这是一个专门使用基本 R 数组的解决方案。你的数据样本很小,所以我使用 replicatesystem.time结合起来衡量性能。

我的解决方案比 sapply 的解决方案快大约 6 倍和 data.table . (我的解决方案需要 0.6 秒才能解决您的小样本数据集 1,000 次。)

为您的解决方案计时

system.time(replicate(1000, 
XX <- data.frame(
intervals,
count = sapply(1:5, function(i) sum(events[, start <= intervals[i] & intervals[i] <= finish])))
))

user system elapsed
4.04 0.05 4.11

我的解决方案。首先创建两个辅助函数来创建大小相等的数组,其中事件在列中运行,间隔在行中运行。然后做一个简单的向量比较,然后是 colSums :
event.array <- function(x, interval){
len <- length(interval)
matrix(rep(unclass(x), len), ncol=len)
}

intervals.array <- function(x, intervals){
len <- length(x)
matrix(rep(unclass(intervals), len), nrow=len, byrow=TRUE)
}


a.start <- event.array(start, intervals)
a.finish <- event.array(finish, intervals)
a.intervals <- intervals.array(start, intervals)

data.frame(intervals,
count=colSums(a.start <= a.intervals & a.finish >= a.intervals))

intervals count
1 (01/02/09 09:30:00) 3
2 (01/02/09 09:45:00) 7
3 (01/02/09 10:00:00) 19
4 (01/02/09 10:15:00) 18
5 (01/02/09 10:30:00) 12

计时我的解决方案
system.time(replicate(1000, 
YY <- data.frame(
intervals,
count=colSums(a.start <= a.intervals & a.finish >= a.intervals))
))

user system elapsed
0.67 0.02 0.69

all.equal(XX, YY)
[1] TRUE

关于r - 计算某个时间戳正在进行的事件数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7203802/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com