r - 强制 GAM 模型拟合为单调并通过 R mgcv 的固定点 (x0, y0)-6ren

r - 强制 GAM 模型拟合为单调并通过 R mgcv 的固定点 (x0, y0)

转载作者：行者123 更新时间：2023-12-03 17:08:06

我试图同时将 GAM 模型拟合到两个约束下的数据:(1)拟合是单调的(增加)，(2)拟合经过一个固定点，例如 (x0,y0) .
到目前为止，我设法让这两个约束分开工作:

对于 (1)，基于 mgcv::pcls() documentation examples ，通过使用 mgcv::mono.con()获得足以满足单调性的线性约束，并通过 mgcv::pcls() 估计模型系数，使用约束。

对于 (2)，基于 this post ，通过使用模型公式中的偏移项将节点位置 x0 处的样条值设置为 0 +。

但是，我很难同时结合这两个约束。 我想一个办法是 mgcv::pcls() ，但我既不能解决 (a) 使用偏移将节点位置 x0 处的样条值设置为 0 + 的类似技巧，也不能 (b) 设置等式约束(我认为这可以产生我的(2)约束)设置)。
我还注意到，对于我的约束条件 (2)，将结点位置 x0 处的样条值设置为 0 的方法产生了奇怪的摆动结果(与不受约束的 GAM 拟合相比)——如下所示。
到目前为止的尝试:分别为两个约束下的数据拟合平滑函数
模拟一些数据

library(mgcv)
set.seed(1)
x <- sort(runif(100) * 4 - 1)
f <- exp(4*x)/(1+exp(4*x))
y <- f + rnorm(100) * 0.1
dat <- data.frame(x=x, y=y)

GAM 无约束(用于比较)

k <- 13
fit0 <- gam(y ~ s(x, k = k, bs = "cr"), data = dat)
# predict from unconstrained GAM fit
newdata <- data.frame(x = seq(-1, 3, length.out = 1000))
newdata$y_pred_fit0 <- predict(fit0, newdata = newdata)

GAM 约束:(1) 拟合是单调的(增加)

k <- 13

# Show regular spline fit (and save fitted object)
f.ug <- gam(y~s(x,k=k,bs="cr"))

# explicitly construct smooth term's design matrix
sm <- smoothCon(s(x,k=k,bs="cr"),dat,knots=NULL)[[1]]
# find linear constraints sufficient for monotonicity of a cubic regression spline
# it assumes "cr" is the basis and its knots are provided as input
F <- mono.con(sm$xp)

G <- list(
  X=sm$X,
  C=matrix(0,0,0),  # [0 x 0] matrix (no equality constraints)
  sp=f.ug$sp,       # smoothing parameter estimates (taken from unconstrained model)
  p=sm$xp,          # array of feasible initial parameter estimates
  y=y,           
  w= dat$y * 0 + 1  # weights for data    
)
G$Ain <- F$A        # matrix for the inequality constraints
G$bin <- F$b        # vector for the inequality constraints
G$S <- sm$S         # list of penalty matrices; The first parameter it penalizes is given by off[i]+1
G$off <- 0          # Offset values locating the elements of M$S in the correct location within each penalty coefficient matrix.  (Zero offset implies starting in first location)

p <- pcls(G);       # fit spline (using smoothing parameter estimates from unconstrained fit)

# predict 
newdata$y_pred_fit2 <- Predict.matrix(sm, data.frame(x = newdata$x)) %*% p
# plot
plot(y ~ x, data = dat)
lines(y_pred_fit0 ~ x, data = newdata, col = 2, lwd = 2)
lines(y_pred_fit2 ~ x, data = newdata, col = 4, lwd = 2)

蓝线:受限；红线:无约束

GAM 约束:(2) 装过 (x0,y0)=(-1, -0.1)

k <- 13

## Create a spline basis and penalty
## Make sure there is a knot at the constraint point (here: -1)
knots <- data.frame(x = seq(-1,3,length=k)) 
# explicit construction of a smooth term in a GAM
sm <- smoothCon(s(x,k=k,bs="cr"), dat, knots=knots)[[1]]

## 1st parameter is value of spline at knot location -1, set it to 0 by dropping
knot_which <- which(knots$x == -1)
X <- sm$X[, -knot_which]                  ## spline basis
S <- sm$S[[1]][-knot_which, -knot_which]  ## spline penalty
off <- dat$y * 0 + (-0.1)                   ## offset term to force curve through (x0, y0)

## fit spline constrained through (x0, y0)
gam_1 <- gam(y ~ X - 1 + offset(off), paraPen = list(X = list(S)))

# predict (add offset of -0.1)
newdata_tmp <- Predict.matrix(sm, data.frame(x = newdata$x))
newdata_tmp <- newdata_tmp[, -knot_which]    
newdata$y_pred_fit1 <- (newdata_tmp %*% coef(gam_1))[, 1] + (-0.1)

# plot
plot(y ~ x, data = dat)
lines(y_pred_fit0 ~ x, data = newdata, col = 2, lwd = 2)
lines(y_pred_fit1 ~ x, data = newdata, col = 3, lwd = 2)
# lines at cross of which the plot should go throught
abline(v=-1, col = 3); abline(h=-0.1, col = 3)

绿线:受限；红线:无约束

最佳答案

我认为你可以增加数据向量 x和 y与 (x0, y0)然后在第一个观察值上(即在 G 列表中添加一个权重向量)赋予(真正)高权重。
除了简单的加权策略，我们可以从初步平滑的结果开始编写二次规划问题。这在下面的第二个 R 代码中进行了说明(在这种情况下，我使用了 p 样条平滑器，参见 Eilers 和 Marx 1991)。
希望这会有所帮助( a similar problem is discussed here )。
Rcode 示例 1(权重策略)

set.seed(123)
N = 100
x <- sort(runif(N) * 4 - 1)
f <- exp(4*x)/(1+exp(4*x))
y <- f + rnorm(N) * 0.1
x = c(-1, x)
y = c(-0.1, y)
dat = data.frame(x = x, y= y)

k <- 13
fit0 <- gam(y ~ s(x, k = k, bs = "cr"), data = dat)
# predict from unconstrained GAM fit
newdata <- data.frame(x = seq(-1, 3, length.out = 1000))
newdata$y_pred_fit0 <- predict(fit0, newdata = newdata)

k <- 13

# Show regular spline fit (and save fitted object)
f.ug <- gam(y~s(x,k=k,bs="cr"))

# explicitly construct smooth term's design matrix
sm <- smoothCon(s(x,k=k,bs="cr"),dat,knots=NULL)[[1]]
# find linear constraints sufficient for monotonicity of a cubic regression spline
# it assumes "cr" is the basis and its knots are provided as input
F <- mono.con(sm$xp)

G <- list(
X=sm$X,
C=matrix(0,0,0),  # [0 x 0] matrix (no equality constraints)
sp=f.ug$sp,       # smoothing parameter estimates (taken from unconstrained model)
p=sm$xp,          # array of feasible initial parameter estimates
y=y,           
w= c(1e8, 1:N * 0 + 1)  # weights for data    
)
G$Ain <- F$A        # matrix for the inequality constraints
G$bin <- F$b        # vector for the inequality constraints
G$S <- sm$S         # list of penalty matrices; The first parameter it penalizes is given by off[i]+1
G$off <- 0          # Offset values locating the elements of M$S in the correct location within each penalty coefficient matrix.  (Zero offset implies starting in first location)

p <- pcls(G);       # fit spline (using smoothing parameter estimates from unconstrained fit)

# predict 
newdata$y_pred_fit2 <- Predict.matrix(sm, data.frame(x = newdata$x)) %*% p
# plot
plot(y ~ x, data = dat)
lines(y_pred_fit0 ~ x, data = newdata, col = 2, lwd = 2)
lines(y_pred_fit2 ~ x, data = newdata, col = 4, lwd = 2)
abline(v = -1)
abline(h = -0.1)

rm(list = ls())
library(mgcv)
library(pracma)
library(colorout)

set.seed(123)
N   = 100
x   = sort(runif(N) * 4 - 1)
f   = exp(4*x)/(1+exp(4*x))
y   = f + rnorm(N) * 0.1
x0  = -1
y0  = -0.1 
dat = data.frame(x = x, y= y)

k = 50
# Show regular spline fit (and save fitted object)
f.ug = gam(y~s(x,k=k,bs="ps"))

# explicitly construct smooth term's design matrix
sm = smoothCon(s(x,k=k,bs="ps"), dat,knots=NULL)[[1]]

# Build quadprog to estimate the coefficients 
scf = sapply(f.ug$smooth, '[[', 'S.scale')
lam = f.ug$sp / scf
Xp  = rbind(sm$X, sqrt(lam) * f.ug$smooth[[1]]$D)  
yp  = c(dat$y, rep(0, k - 2))
X0  = Predict.matrix(sm, data.frame(x = x0))
sm$deriv = 1
X1 = Predict.matrix(sm, data.frame(x = dat$x))
coef_mono = pracma::lsqlincon(Xp, yp, Aeq = X0, beq = y0, A = -X1, b = rep(0, N))

# fitted values
fit = sm$X %*% coef_mono
sm$deriv = 0
xf = seq(-1, 3, len = 1000)
Xf = Predict.matrix(sm, data.frame(x = xf))
fine_fit = Xf %*% coef_mono

# plot
par(mfrow = c(2, 1), mar = c(3,3,3,3))
plot(dat$x, dat$y, pch = 1, main= 'Data and fit')
lines(dat$x, f.ug$fitted, lwd = 2, col = 2)
lines(dat$x, fit, col = 4, lty = 1, lwd = 2)
lines(xf, fine_fit, col = 3, lwd = 2, lty = 2)
abline(h = -0.1)
abline(v = -1)

plot(dat$x, X1 %*% coef_mono, type = 'l', main = 'Derivative of the fit', lwd = 2)
abline(h = 0.0)

关于r - 强制 GAM 模型拟合为单调并通过 R mgcv 的固定点 (x0, y0)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66648829/

文章推荐： objective-c - 具有 subview 和多个 NIB 的核心数据绑定(bind)

文章推荐： cocoa - 单独线程上的 NSURLDownload 委托(delegate)方法

文章推荐： javascript - 在入口点自动加载 block

gams-math - GAMS 中的调试器工具
我想在 GAMS 模型中找到我的错误。我没有任何错误，但我的模型效果不佳 GAMS 中是否有任何调试工具？(如其他软件中的调试器工具，例如 MATLAB) 最好的事物最佳答案不幸的是，我没有遇到任
r - 如何从 GAM (`mgcv::gam` 中提取拟合样条)
我正在使用 GAM 对逻辑回归中的时间趋势进行建模。然而，我想从中提取拟合样条线以将其添加到另一个无法在 GAM 或 GAMM 中拟合的模型中。因此我有两个问题: 我怎样才能随着时间的推移拟合更平滑
r - stat_smooth gam 与 gam {mgcv} 不同
我在 ggplot2 中使用 stat_smooth 函数，决定我想要“拟合优度”，并为此使用了 mgvc gam。我突然想到我应该检查以确保它们是相同的模型(stat_smooth vs mgvc
r - 保存适合的 gam 对象时节省空间(mgcv::gam 和 scam::scam)
我在大型数据集上估算一个非常简单的模型。公式看起来像 scam::scam(formula = ratio ~ s(rate,bs="mpi")) 然后使用这些模型为新数据生成预测。我不关心模型的任
r - GAM 模型错误
我的数据框看起来像: head(bush_status) distance status count 0 endemic 844 1 exotic 8
r - GAM 中的时间自相关
我正在对过去大约 40 年中零星收集的物种计数数据与一系列环境预测因子进行建模。目前，我的 GAM 是这样的: k = gam(CountIndividuals ~ s(Date, bs = 'cr'
R:适合数据子集的 GAM
我使用 gam 拟合了广义加性模型来自 mgcv包裹。我有一个数据表，其中包含我的因变量 Y , 自变量 X , 其他自变量 Oth和一个两级因子 Fac .我想适合以下型号 Y ~ s(X) + O
integer - GAMS 中超出时间限制
我在 GMAS 中编写我的 MIP 模型，求解器表明我的模型中有 1535272 行、3287490 列和 8425140 个非零(我不确定它对于 GAMS 来说是否太大)。经过 899677 次迭代
r - GAM 中的权重选项
我的数据集有许多冗余观测值(但每个观测值都应该被计数)。因此我考虑在 GAM 中使用“权重”选项，因为它可以显着减少计算时间。 gam函数(在 mgcv 包中)解释说它们是“等价的”(来自参数 ?ga
gam - GAMLSS错误:响应变量超出范围
我无法理解为什么收到此错误。我的两个变量都是数字且长度相同，当变量不相同时，我用NA调整数据。但是我仍然收到一个错误，我的响应变量超出范围 year 0))>=5) # If at least
gams-math - 不属于设置
我有3套，我想知道哪个元素不属于对称差集。 Set1={1*125} 组2={20*450} Set3={45*235} 我用 SymAB 显示了 setA 和 set B 的对称差异。我计算 s
r - gam 包上的命名空间说明符不起作用
我不明白为什么下面的两个 gam 模型会产生不同的结果。唯一的区别是在其中一个模型中，我在函数 gam 和 s 之前添加了命名空间说明符 gam::。我想这样做是因为我正在探索在 gam 包和 mg
r - 具有新因子水平的 predict.gam
我正在对涉及两个拟合步骤的物种分布数据进行障碍类型分析。第一步是使用 family=quasibinomial 的所有数据对 (m1) 存在/不存在数据建模。第二步 (m2) 是使用 family=G
r - 随着时间的推移从 gam 模型计算年利率？
问题移至 CrossValidated 我试图表达 gam 中两个类别之间“增长速度”的差异造型。我的数据表示随着时间的推移 [0-100%] 的累积值，但我希望(为了与其他研究的可比性)以年度值来表
r - 使用方法 gam : 从插入符号包中训练错误
我有一个 gam我所知道的模型在 R 中运行良好，但是当我尝试“train ”使用 caret 相同的模型时package 它返回一个错误，指出输入数据列是列表。有没有人明白这一点？我正在运行的代码
r - 从 GAM 平滑对象确定导数
我有一个非常简单的时间序列数据集，由单个变量(“AVERAGE”)的年平均值组成。我希望研究时间序列的“趋势”分量的变化率(一阶导数)和加速度(二阶导数)以及相关的标准误差。我使用MGCV的GAM和P
r - 如何检查具有负二项分布的 GAM 中的过度分散？
我使用 gam 在负二项式族中拟合广义加性模型来自 mgcv包裹。我有一个包含因变量 y 的数据框, 自变量 x , 一个因素 fac和一个随机变量 ran .我适合以下模型 gam1 sum(r
r - 找不到函数 plot.gam
我正在阅读“R 中的应用程序统计学习简介”(ISLR)，我被困在第 295 页的一部分，即广义加法模型实验室。当我运行以下代码时，我得到一个错误 Error in plot.gam(gam1, se
使用 mgcv gam 运行随机误差模型需要太多内存
我正在研究一个模型，其中包含多个 RE 和一个变量的样条，因此我尝试使用 gam() .但是，我遇到了内存耗尽限制错误(即使我在具有 128GB 的集群上运行它时也是如此)。即使我只用一个 RE
r - 在因子 : GAM 内添加两个随机效应
从这个数据: UQdata MudUQ Estuary Site 7.00 10.9 NoriPau A 6.00 13.9 NoriPau A 5.00

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - 强制 GAM 模型拟合为单调并通过 R mgcv 的固定点 (x0, y0)