gpt4 book ai didi

r - while循环在for循环中,有没有更简单、更快的方法?

转载 作者:行者123 更新时间:2023-12-02 09:26:15 24 4
gpt4 key购买 nike

我有以下数据集,其中包含大约 64000 行:

    Trial.time Recording.time X.center Y.center  Area    Areachange Elongation   Distance.moved Movement.Moving...Center.point.
2 300.030 0.000 -49.1651 31.9676 0.917085 0.65113 0.851349 - -
22 300.696 0.666 -48.4404 31.9945 0.816206 0.715326 0.831207 0.725139 1
24 300.763 0.733 -47.996 32.0696 0.834547 0.412688 0.856234 0.450784 1
33 301.063 1.033 -47.6583 32.0598 0.75201 0.137563 0.716028 0.337775 1
41 301.330 1.299 -47.3385 32.0139 0.843718 0.302638 0.838526 0.323117 1
98 303.230 3.199 -47.3914 31.6981 0.944598 1.26558 0.847969 0.32022 1
113 303.730 3.699 -47.3807 31.0614 0.86206 1.24724 0.761099 0.636771 1
114 303.763 3.733 -47.1308 30.3858 1.00879 1.1005 0.809162 0.72036 1
116 303.830 3.799 -47.1914 30.0551 1.01796 0.440201 0.831924 0.336155 1

一般来说,它描述了对象在特定 Recording.time 的移动(Distance.Moved)。如果连续两行的 Recording.time 小于 0.035,则这两行都属于一次运动。相反,如果它更大,则时间点代表两个单独的运动。我的工作是确定每个 Action 的长度,即一个 Action 有多少连续行以及该 Action 内的总 Distance.moved。我编写了以下代码,该代码可以工作,但速度很慢,我想请问您是否知道如何提高速度。

    time <- c()
j.final <- c()

#Go through all rows of the data.frame
for(i in 1:length(data2[,1])){
i <- 1
j <- 1
if (!is.na(data2$Recording.time[i+1])){

# As long as the distance between two consecutive time points is smaller than 0.035, increase the counter by one
while (data2$Recording.time[i+1]-data2$Recording.time[i] <= 0.035){
j <- j+1
i <- i+1
}
# Save the number of consecutive time points
j.final <- rbind(j.final,j)
# Save the time of the last movement frame
time <- rbind(time,data2$Recording.time[j])
# Delete the amount of rows that gave one single movement
data2 <- data2[-(1:j),]
}
}
final <- cbind(j.final,time)

#Same as above... Continouslz rows out of the data.frame
data2 <- data1
for (i in 1:length(j.final)){
Dtotal <- sum(data2$Distance.moved[1:j.final[i]])
distance <- rbind(distance, Dtotal)
data2 <- data2[-(1:j.final[i]),]
}
final <- cbind(final,distance)
dimnames(final) <- list(NULL,c("Frames","Time","Distance"))
epicfinal <- as.data.frame(final)

最终结果如下所示(请不要介意速度)

  Frames  Time  Distance    velocity
1 1 0.033 0.0407652 0.001386017
2 18 0.666 1.4887506 0.911115367
3 3 0.799 0.0912680 0.009309336
4 7 1.066 0.3703880 0.088152344
5 2 1.166 0.0371303 0.002524860
6 3 1.299 0.1013617 0.010338893

最佳答案

正如 zx8754 所指出的,这可以通过 lag 轻松实现(或者更好的是,他在 data.table 中的快速实现:shift)和 cumsum 函数。
我使用 data.table 包来提高速度(请注意,语法与经典的 data.frames 有很大不同,因为使用 data.table 您可以对表进行子集化时,将表达式放入 j 参数中,而不是简单地在 data.frame 中选择列)。

library(data.table)

## VARIABLE CREATION:
# Create a column which indicates the lag between two observations
data$lag <- data$Recording.time-shift(data$Recording.time)
data$lag[1] <- 0 # The first value is always NA: fix it
data$newmovement <- data$lag<0.035 # Binary variable: T if there's a new movement, F otherwise
data$movement_index <- cumsum(data$newmovement) # Index to identify the movement

## COMPUTATIONS:
# Use the data.table package for fast computations
data <- data.table(data)
data[,.(length_movement=.N, # Length (nrows) for each movement
total_distance=sum(Distance.moved,na.rm = T)), # Total distance: sum of distances for each movement
by=movement_index] # Subset by=movement_index

# movement_index length_movement total_distance
# 1: 1 7 2.793806
# 2: 2 2 1.056515

请注意,##VARIABLE CREATION 部分也可以通过 data.table 包来实现。
这可能会导致额外的速度提升,您可以通过将代码的第一部分替换为以下内容来实现:

## VARIABLE CREATION:
data[,lag:=Recording.time-shift(Recording.time)][1,lag:=0L]
data[,newmovement:=lag<0.035]
data[,movement_index:=cumsum(newmovement)]

关于r - while循环在for循环中,有没有更简单、更快的方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37808176/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com