gpt4 book ai didi

r - 在 R 中编码高斯对数似然

转载 作者:行者123 更新时间:2023-12-01 00:25:52 29 4
gpt4 key购买 nike

我正在尝试通过编码高斯对数似然来学习 R 以解决 optim() ,但经过数小时的汗水,我仍然偏离目标。 (这是自学,不是作业。)

我在许多用户编写的函数中遵循约定,这些函数编写了一个类似 loglik <- function(theta, y, x) 的函数。哪里theta是权重向量 beta和方差 sigma , y是结果和x是数据。

我的带有模拟数据的完整代码如下。运行它,你会发现我的函数与 lm() 相比有点离谱。 .谁能告诉我我哪里出错了?

# random data
set.seed(111)
y = sample(1:100,100)
x1 = sample(1:100,100)*rnorm(1,0)
x2 = sample(x1)*rnorm(1,0)
x3 = sample(x2)*rnorm(1,0)
dat = data.frame(x1,x2,x3)

# define gaussian log-likelihood
logLik <- function(theta, Y, X){
X <- as.matrix(X) # convert data to matrix
k <- ncol(X) # get the number of columns (independent vars)
beta <- theta[1:k] # vector of weights intialized with starting values
expected_y <- X %*% beta # X is dimension (n x k) and beta is dimension (k x 1)
sigma2 <- theta[k+1] # should be pulled from the last of the starting values vector
LL <- sum(dnorm(Y, mean = expected_y, sd = sigma2, log = T)) # sum of the PDF over each observation
return(-LL)
}

这是输出:
> optim(logLik, par=starting_values, method="Nelder-Mead", Y=y, X=dat, hessian = T)$par
[1] 0.4832514 -0.2276684 -0.3860800 32.7168490 -38.9030319
> coefficients(lm(y~x1+x2+x3))
(Intercept) x1 x2 x3
58.17347451 -0.06587320 0.13001865 -0.03624233

最佳答案

你的方法的基础是合理的,但有些细节是错误的。首先,根据高斯线性模型构建数据是有意义的;例如

set.seed(111)
X <- cbind(1, matrix(rnorm(100*3), 100, 3))
y <- X %*% rep(1, 4) + rnorm(100, 0, 2)

starting.values <- c(1, 1, 1, 1, 2) # actual parameters

# define gaussian log-likelihood
logLik <- function(theta, y, X){
k <- ncol(X) # get the number of columns (independent vars)
beta <- theta[1:k] # vector of weights intialized with starting values
expected_y <- X %*% beta # X is dimension (n x k) and beta is dimension (k x 1)
sigma <- theta[k+1] # should be pulled from the last of the starting values vector
LL <- sum(dnorm(y, mean = expected_y, sd = sigma, log = TRUE)) # sum of the PDF over each observation
return(-LL)
}

顺便说一句, *norm()函数根据 SD 参数化,而不是方差。

然后
> optim(logLik, par=starting.values, y=y, X=X, method="BFGS")$par
[1] 1.0471420 1.1411523 0.8167656 0.9840397 1.8910201
Warning message:
In dnorm(y, mean = expected_y, sd = sigma, log = TRUE) : NaNs produced

> summary(lm(y ~ X - 1))

Call:
lm(formula = y ~ X - 1)

Residuals:
Min 1Q Median 3Q Max
-4.5062 -1.3293 0.1371 1.2057 5.8116

Coefficients:
Estimate Std. Error t value Pr(>|t|)
X1 1.0471 0.1952 5.365 5.61e-07 ***
X2 1.1412 0.1818 6.275 1.00e-08 ***
X3 0.8168 0.1907 4.282 4.39e-05 ***
X4 0.9840 0.2122 4.638 1.11e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.93 on 96 degrees of freedom
Multiple R-squared: 0.5333, Adjusted R-squared: 0.5138
F-statistic: 27.42 on 4 and 96 DF, p-value: 3.468e-15

请注意 method="BFGS"发出警告但产生正确答案; method="Nelder-Mead"精度稍差。另请注意,通常对误差 SD 的估计与 ML 估计不同。

我希望这有帮助

关于r - 在 R 中编码高斯对数似然,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44981549/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com