gpt4 book ai didi

r - 如何在 r 中处理具有超过 500 万个观测值的数据帧时加速迭代?

转载 作者:行者123 更新时间:2023-12-04 11:29:08 25 4
gpt4 key购买 nike

我试图在数百万次观察中为超过 7 个变量生成值,当我编写 for 循环来实现这一点时,它需要永远。下面是我试图实现的一个例子。在这种情况下,它很快,因为它只有几千个观察值:

# Load dplyr


library(tidyverse)
set.seed(50)

df <- data_frame(SlNo = 1:2000,
Scenario = rep(c(1, 2, 3, 4),500),
A = round(rnorm(2000, 11, 6)),
B = round(rnorm(2000, 15, 4))) %>%
arrange(Scenario)

#splitting data-frame to add multiple rows in the data-frame

df<- df %>% split(f = .$Scenario) %>%
map_dfr(~bind_rows(tibble(Scenario = 0), .x))

#observations for certain variables in the newly added rows have specific values

df <- df %>% mutate(C = if_else(Scenario != 0, 0, 4),
E = if_else(Scenario != 0, 0, 6))

for(i in 2:nrow(df)) {

df$C[i] <- if_else(df$Scenario[i] != 0, (1-0.5) * df$C[i-1] + 3 + 2 + df$B[i] + df$E[i-1],
df$C[i])
df$E[i] <- if_else(df$Scenario[i] != 0, df$C[i] + df$B[i] - 50, df$E[i])


}

df

# A tibble: 2,004 x 6
Scenario SlNo A B C E
<dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 0 NA NA NA 4 6
2 1 1 14 19 32 1
3 1 5 1 13 35 -2
4 1 9 17 20 40.5 10.5
5 1 13 8 7 42.8 -0.25
6 1 17 10 16 42.1 8.12
7 1 21 9 12 46.2 8.19
8 1 25 14 18 54.3 22.3
9 1 29 14 15 69.4 34.4
10 1 33 4 17 91.1 58.1
# ... with 1,994 more rows

我想在处理更大的数据框时快速产生类似的结果。我很感激这方面的任何帮助。先感谢您!!

最佳答案

tidyverse您可以使用 purrr::accumulate像这样

library(tidyverse)
set.seed(50)

df <- data.frame(SlNo = 1:2000,
Scenario = rep(c(1, 2, 3, 4),500),
A = round(rnorm(2000, 11, 6)),
B = round(rnorm(2000, 15, 4))) %>%
arrange(Scenario)

df %>%
nest(data = B) %>%
group_by(Scenario) %>%
mutate(new = accumulate(data,
.init = tibble(C = 4, E = 6),
~ tibble(C = (1 -0.5)* .x$C + 5 + .y$B + .x$E,
E = 0.5 * .x$C + 5 + .x$E + 2 * .y$B - 50
)
)[-1]
) %>% ungroup %>%
unnest_wider(data) %>%
unnest_wider(new)

#> # A tibble: 2,000 x 6
#> SlNo Scenario A B C E
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 14 19 32 1
#> 2 5 1 1 13 35 -2
#> 3 9 1 17 20 40.5 10.5
#> 4 13 1 8 7 42.8 -0.25
#> 5 17 1 10 16 42.1 8.12
#> 6 21 1 9 12 46.2 8.19
#> 7 25 1 14 18 54.3 22.3
#> 8 29 1 14 15 69.4 34.4
#> 9 33 1 4 17 91.1 58.1
#> 10 37 1 13 15 124. 88.7
#> # ... with 1,990 more rows
创建于 2021-07-05 由 reprex package (v2.0.0)

关于r - 如何在 r 中处理具有超过 500 万个观测值的数据帧时加速迭代?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54657248/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com