gpt4 book ai didi

r - 如何在R中找到累积方差或标准差

转载 作者:行者123 更新时间:2023-12-04 11:55:22 25 4
gpt4 key购买 nike

我在数据框中有一个X列,我需要为其找到累积的标准偏差。

X  Cumulative.SD
1 -
4 2.12
5 2.08
6 2.16
9 2.91

最佳答案

您还可以检查Wiki网站上的Algorithms for calculating variance并使用Rcpp实现Welford's Online algorithm,如下所示

library(Rcpp)
func <- cppFunction(
"arma::vec roll_var(arma::vec &X){
const arma::uword n_max = X.n_elem;
double xbar = 0, M = 0;
arma::vec out(n_max);
double *x = X.begin(), *o = out.begin();

for(arma::uword n = 1; n <= n_max; ++n, ++x, ++o){
double tmp = (*x - xbar);
xbar += (*x - xbar) / n;
M += tmp * (*x - xbar);
if(n > 1L)
*o = M / (n - 1.);
}

if(n_max > 0)
out[0] = std::numeric_limits<double>::quiet_NaN();

return out;
}", depends = "RcppArmadillo")

# it gives the same
x <- c(1, 4, 5, 6, 9)
drop(func(x))
#R [1] NaN 4.50 4.33 4.67 8.50
sapply(seq_along(x), function(i) var(x[1:i]))
#R [1] NA 4.50 4.33 4.67 8.50

# it is fast
x <- rnorm(1e+3)
microbenchmark::microbenchmark(
func = func(x),
sapply = sapply(seq_along(x), function(i) var(x[1:i])))
#R Unit: microseconds
#R expr min lq mean median uq max neval
#R func 9.09 9.88 30.7 20.5 21.9 1189 100
#R sapply 43183.49 45040.29 47043.5 46134.4 47309.7 80345 100

取平方根为您提供标准偏差。

该方法的主要优点是 cancellation没有问题。例如,比较
# there are no issues with cancellation
set.seed(99858398)
x <- rnorm(1e2, mean = 1e8)
cumvar <- function (x, sd = FALSE) {
n <- seq_along(x)
v <- (cumsum(x ^ 2) - cumsum(x) ^ 2 / n) / (n - 1)
if (sd) v <- sqrt(v)
v
}
z1 <- drop(func(x))[-1]
z2 <- cumvar(x)[-1]
plot(z1, ylim = range(z1, z2), type = "l", lwd = 2)
lines(seq_along(z2), z2, col = "DarkBlue")

enter image description here

蓝线是从平方均值中减去平方值的算法。

关于r - 如何在R中找到累积方差或标准差,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52459711/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com