gpt4 book ai didi

r - 过滤/子集数据框到变化的阈值

转载 作者:行者123 更新时间:2023-12-04 02:12:19 26 4
gpt4 key购买 nike

我有以下数据框,其中包含多行的角度变化值:

'data.frame':   712801 obs. of  4 variables:
$ time_passed: int 1 2 3 4 5 6 7 8 9 10 ...
$ dRoll : num 0.9798 -0.5099 -0.0974 -0.4985 0.1719 ...
$ dPitch : num -0.175 -0.0655 0.0653 0.8907 -1.0893 ...
$ dYaw : num 0.33232 0.06875 -0.00573 0.59588 -0.55577 ...

> myData[1:20,]
time_passed dRoll dPitch dYaw
1 0.97975783 -0.17498131 0.332315521
2 -0.50993244 -0.06548908 0.068754935
3 -0.09740283 0.06531719 -0.005729578
4 -0.49847328 0.89072019 0.595876107
5 0.17188734 -1.08930736 -0.555769061
6 0.68181978 0.36852645 0.492743704
7 1.07143108 0.15206300 -0.635983153
8 -1.43812407 -0.76638835 -0.509932438
9 0.43544792 0.41241502 0.767763445
10 0.25210143 0.61375239 0.509932438
11 0.38961130 0.01203211 -0.360963411
12 0.03437747 -0.29633377 -0.315126787
13 -0.33804510 -0.40639896 -0.177616916
14 0.68181978 0.32446600 0.435447924
15 -1.12872686 -0.37752189 -0.275019742
16 0.75057471 0.33907642 0.464095814
17 -0.25783101 0.11310187 0.309397209
18 -0.01718873 -0.13435860 -0.521391594
19 0.12605071 0.12817066 -0.085943669
20 0.02291831 -0.59856901 -0.120321137

我会怎样写类似的东西

"If the sum of subsequent negative (or positive) values is smaller than my threshold (say, 5° change), then trow it out of the data set"

在 R 代码中?

我想将此标准应用于任何行,所以 dRoll dPitch dYaw


在这种情况下,基于 dRoll 列应用,输出将是:

time_passed       dRoll       dPitch      dYaw
1 0.97975783 -0.17498131 0.332315521
5 0.17188734 -1.08930736 -0.555769061
6 0.68181978 0.36852645 0.492743704
7 1.07143108 0.15206300 -0.635983153
9 0.43544792 0.41241502 0.767763445
10 0.25210143 0.61375239 0.509932438
11 0.38961130 0.01203211 -0.360963411
12 0.03437747 -0.29633377 -0.315126787
14 0.68181978 0.32446600 0.435447924
16 0.75057471 0.33907642 0.464095814
19 0.12605071 0.12817066 -0.085943669
20 0.02291831 -0.59856901 -0.120321137

dRoll 中的所有负数运行都被丢弃,因为后续负值的总和小于 5 度:

  • dRoll 中的第一次负运行:sum(myData[2:4,2]) = -1.105809
  • 第二次、第三次和第四次运行只有一个数字:-1.43812-0.33804-1.12872
  • 上次在 dRoll 中运行:sum(myData[17:18,2]) = -0.2750197

在 R 中如何做到这一点?

最佳答案

我的建议是首先将您的数据框融合为长格式。之后,您可以更轻松地进行分组操作。

使用 data.table 包(meltrleid 函数需要它):

# load the package
library(data.table)

# melt into long format
DT2 <- melt(DT, id = 'time_passed')

# create a cummulative sum for each run
# 'rleid(value > 0)' creates a grouping variable for runs of consecutive positive/negative values
# by adding '[.N]' to 'cumsum(value)' you set all values in 'csum' to the highest value
# for each run, which we can use to filter the data
DT2[, csum := cumsum(value)[.N], by = .(variable, rleid(value > 0))]

# filter the data according to a rule
# in this case only the values between -1.2 and -0.2 are filtered out
DT2[csum < -1.2 | csum > -0.2]

它给出(结果的快照):

    time_passed variable        value         csum
1: 1 dRoll 0.979757830 0.979757830
2: 5 dRoll 0.171887340 1.925138200
3: 6 dRoll 0.681819780 1.925138200
4: 7 dRoll 1.071431080 1.925138200
5: 8 dRoll -1.438124070 -1.438124070
6: 9 dRoll 0.435447920 1.111538120
....
....
14: 3 dPitch 0.065317190 0.956037380
15: 4 dPitch 0.890720190 0.956037380
16: 6 dPitch 0.368526450 0.520589450
17: 7 dPitch 0.152063000 0.520589450
18: 9 dPitch 0.412415020 1.038199520
19: 10 dPitch 0.613752390 1.038199520
....
....
26: 1 dYaw 0.332315521 0.401070456
27: 2 dYaw 0.068754935 0.401070456
28: 3 dYaw -0.005729578 -0.005729578
29: 4 dYaw 0.595876107 0.595876107
30: 6 dYaw 0.492743704 0.492743704
31: 9 dYaw 0.767763445 1.277695883

关于r - 过滤/子集数据框到变化的阈值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37251595/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com