gpt4 book ai didi

r - 如何在没有模型对象的情况下根据 ns 样条参数进行预测

转载 作者:行者123 更新时间:2023-12-01 22:32:14 25 4
gpt4 key购买 nike

我有 R 中拟合的 glm 的系数,我想预测一组新数据的预期值。如果我有模型对象,这会很简单,使用predict()。但是,我现在不在现场,出于数据保密原因,我不再拥有模型对象。我只有使用summary(model) 生成的摘要对象,其中包含模型系数。

使用系数来预测简单模型的期望值非常容易。但是,我想知道当模型包含三次样条 ns() 时如何执行此操作。当模型还包含分类变量时,任何快捷方式也将受到赞赏。

这是一个简单的例子。

library(splines)
dat <- data.frame(x=1:500, z=runif(500), k=as.factor(sample(c("a","b"), size=500, replace=TRUE)))
kvals <- data.frame(kn=c("a","b"),kv=c(20,30))
dat$y = dat$x + (40*dat$z)^2 + kvals$kv[match(dat$k,kvals$kn)] + rnorm(500,0,30)
# Fit model
library(splines)
mod <- glm(y ~ x + ns(z,df=2) + k,data=dat)
# Create new dataset
dat.new <- expand.grid(x=1:3,z=seq(0.2,0.4,0.1),k="b")
# Predict expected values in the usual way
predict(mod,newdata=dat.new)
summ <- summary(mod)
rm(mod)
# Now, how do I predict using just the summary object and dat.new?

最佳答案

可能有一种更有效的方法来解决这个问题,但这里是一个起点,可以帮助您实现罗兰简要建议的策略。 summ 对象具有定义样条函数所需的信息,但它是隐藏的:

    names(summ)
[1] "call" "terms" "family" "deviance" "aic"
[6] "contrasts" "df.residual" "null.deviance" "df.null" "iter"
[11] "deviance.resid" "coefficients" "aliased" "dispersion" "df"
[16] "cov.unscaled" "cov.scaled"

查看 terms 叶的结构,我们看到样条细节埋藏在 predvars 子叶内更深处:

str(summ$terms)
Classes 'terms', 'formula' language y ~ x + ns(z, df = 2) + k
..- attr(*, "variables")= language list(y, x, ns(z, df = 2), k)
..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4] "y" "x" "ns(z, df = 2)" "k"
.. .. ..$ : chr [1:3] "x" "ns(z, df = 2)" "k"
..- attr(*, "term.labels")= chr [1:3] "x" "ns(z, df = 2)" "k"
..- attr(*, "order")= int [1:3] 1 1 1
..- attr(*, "intercept")= int 1
..- attr(*, "response")= int 1
..- attr(*, ".Environment")=<environment: R_GlobalEnv>
..- attr(*, "predvars")= language list(y, x, ns(z, knots = structure(0.514993450604379, .Names = "50%"), Boundary.knots = c(0.00118412892334163, 0.99828373757191), intercept = FALSE), k)
..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "numeric" "nmatrix.2" "factor"
.. ..- attr(*, "names")= chr [1:4] "y" "x" "ns(z, df = 2)" "k"

所以拉出属性:

str(attributes(summ$terms)$predvars)
language list(y, x, ns(z, knots = structure(0.514993450604379, .Names = "50%"),
Boundary.knots = c(0.00118412892334163, 0.99828373757191), intercept = FALSE), k)

您可以看到,如果提供所需的 x、y、z 和 k 值,则可以恢复样条线:

with(dat, ns(z, knots = 0.514993450604379, Boundary.knots = c(0.00118412892334163, 
0.99828373757191), intercept = FALSE) )
#---
1 2
[1,] 5.760419e-01 -1.752762e-01
[2,] 2.467001e-01 -1.598936e-01
[3,] 4.392684e-01 4.799757e-01
snipping ....
[498,] 4.965628e-01 -2.576437e-01
[499,] 5.627389e-01 1.738909e-02
[500,] 2.393920e-02 -1.611872e-02
attr(,"degree")
[1] 3
attr(,"knots")
[1] 0.5149935
attr(,"Boundary.knots")
[1] 0.001184129 0.998283738
attr(,"intercept")
[1] FALSE
attr(,"class")
[1] "ns" "basis" "matrix"

如果您知道数据的极端情况,您可以构建替代的dat。请参阅 ?ns 及其链接到的其他帮助页面。

关于r - 如何在没有模型对象的情况下根据 ns 样条参数进行预测,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39565143/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com