gpt4 book ai didi

r - 如何在 `data.table`中加入有最小数据量的条件来计算一个变量

转载 作者:行者123 更新时间:2023-12-01 13:11:48 25 4
gpt4 key购买 nike

我有两个数据框。第一个 df 总结了超时(DateTime)一个鱼类的几个个体(ID)的检测。例如:

options("digits.secs" = 3)

df<- data.frame(DateTime=c("2017-08-05 14:03:55.300","2017-08-05 16:18:12.100","2017-08-05 20:34:31.540","2017-08-05 16:18:14.355","2017-08-05 20:34:33.605"),
ID= c("A","B","C","B","C"))

df
DateTime ID
1 2017-08-05 14:03:55.300 A
2 2017-08-05 16:18:12.100 B
3 2017-08-05 20:34:31.540 C
4 2017-08-05 16:18:14.355 B
5 2017-08-05 20:34:33.605 C

另一个数据框 Activity 包含这些人随时间的事件信息。该数据具有高时间分辨率。也就是说,它是每秒 11 个数据(11 赫兹)。作为一个可重现的例子:

set.seed(100)
fmt <- "%Y-%m-%d %H:%M:%OS"

DateTime = seq(from=as.POSIXct("2017-08-05 14:03:55.100", format=fmt, tz="UTC"), by=1/11, length.out=80)
ID = rep("A", each=80)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
Activity1<- data.frame(DateTime,ID, x, y, z)

DateTime = seq(from=as.POSIXct("2017-08-05 16:18:11.900", format=fmt, tz="UTC"),by=1/5, length.out=40)
ID = rep("B", each=40)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
Activity2<- data.frame(DateTime,ID, x, y, z)

DateTime = seq(from=as.POSIXct("2017-08-05 16:18:19.703", format=fmt, tz="UTC"),by=1/11, length.out=40)
ID = rep("B", each=40)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
Activity3<- data.frame(DateTime,ID, x, y, z)

DateTime = seq(from=as.POSIXct("2017-08-05 20:34:31.240", format=fmt, tz="UTC"),by=1/11, length.out=80)
ID = rep("C", each=80)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
Activity4<- data.frame(DateTime,ID, x, y, z)
Activity<- rbind(Activity1,Activity2,Activity3,Activity4)

head(Activity)
DateTime ID x y z
1 2017-08-05 14:03:55.099 A 0.01 -0.16 -1.00
2 2017-08-05 14:03:55.190 A 0.11 0.55 -0.69
3 2017-08-05 14:03:55.281 A 0.50 0.79 1.00
4 2017-08-05 14:03:55.372 A 0.97 -0.76 0.24
5 2017-08-05 14:03:55.463 A -0.97 -0.59 0.20
6 2017-08-05 14:03:55.554 A -0.46 0.42 -0.88

我正在使用下面的代码来计算 df 中的变量 VeDBARMS。我使用来自 Activity 的数据来计算它们,并将这些变量添加到数据框 df 中。作为总结,对于每一行 df 数据,我使用 2 秒的数据(这是 22 行,因为我有数据帧 Activity 中每秒 11 个数据)来自 Activity 数据帧并使用数据帧 dfDateTime > 作为开始时间。

library(data.table)
setDT(df)[, DateTime := as.POSIXct(DateTime, format=fmt, tz="UTC")][,
c("start", "end") := .(DateTime, DateTime+2)]
setDT(Activity)[, DateTime := as.POSIXct(DateTime, format=fmt, tz="UTC")]

df<- Activity[df, on=.(ID, DateTime>=start, DateTime<=end),
by=.EACHI, .(
DateTime=i.DateTime,
ID=i.ID,
VeDBA=sum(sqrt(x^2 + y^2 + z^2)) / .N,
RMS=sqrt((sum(x^2) + sum(y^2) + sum(z^2)) / .N))][,
(1L:3L) := NULL][]

问题是有时我每秒有 5 个数据而不是数据帧 Activity 中的 11 个。因此,我想包含某种代码来指示比,例如,当我在提到的 2 秒期间内的数据少于 14 个时,我想为 VeDBA 显示 NA > 和 df 中的 RMS

到目前为止,使用上述代码我得到了这个:

df
DateTime ID VeDBA RMS
1: 2017-08-05 14:03:55.299 A 0.9919576 1.0264458
2: 2017-08-05 16:18:12.099 B 0.9375138 0.9573975
3: 2017-08-05 20:34:31.539 C 0.9294209 0.9764383
4: 2017-08-05 16:18:14.355 B 0.7542922 0.7886634
5: 2017-08-05 20:34:33.605 C 1.0041628 1.0395891

我想得到这个:

df
DateTime ID VeDBA RMS
1: 2017-08-05 14:03:55.299 A 0.9919576 1.0264458
2: 2017-08-05 16:18:12.099 B NA NA # Between 16:18:12.099 and 16:18:14.099 there is only 10 data instead of 22
3: 2017-08-05 20:34:31.539 C 0.9294209 0.9764383
4: 2017-08-05 16:18:14.355 B NA NA
5: 2017-08-05 20:34:33.605 C 1.0041628 1.0395891

有谁知道如何修改我使用 data.table 的代码来获取那些 NA

最佳答案

如果 .N 低于给定的阈值 n_min,下面的修改将返回 NA:

n_min <- 14L
Activity[df, on = .(ID, DateTime >= start, DateTime <= end),
by = .EACHI, .(
DateTime = i.DateTime,
ID = i.ID,
.N, # inserted just to verify the result, to be omitted in production code
VeDBA = if (.N < n_min) NA_real_ else sum(sqrt(x ^ 2 + y ^ 2 + z ^ 2)) / .N,
RMS = if (.N < n_min) NA_real_ else sqrt((sum(x ^ 2) + sum(y ^ 2) + sum(z ^ 2)) / .N)
)][,
(1L:3L) := NULL][]
                  DateTime ID  N     VeDBA       RMS
1: 2017-08-05 14:03:55.299 A 22 0.8777660 0.9181305
2: 2017-08-05 16:18:12.099 B 10 NA NA
3: 2017-08-05 20:34:31.539 C 22 0.8807835 0.9383084
4: 2017-08-05 16:18:14.355 B 10 NA NA
5: 2017-08-05 20:34:33.605 C 22 1.0587765 1.1023549

请注意,插入 N 列只是为了验证结果。另请注意,尽管 set.seed(100) 用于创建数据,但输出与 OP 的预期结果不同。

if() 可以在这里使用,因为每个 .EACHI 组只有一个 .N 值。

关于r - 如何在 `data.table`中加入有最小数据量的条件来计算一个变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59480858/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com