r - R 中非线性混合效应的不稳定性(使用 nlme 包)-6ren

r - R 中非线性混合效应的不稳定性(使用 nlme 包)

转载作者：行者123 更新时间：2023-12-04 10:16:06

我正在尝试构建一个 nonlinear mixed effects model COVID-19 数据与来自不同国家的每日病例数拟合成钟形曲线(随机效应在国家层面)。

数据表太大，无法包含在帖子中，但这里是 dataframe 的结构:

> str(dat)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   2642 obs. of  7 variables:
 $ Country.Region: Factor w/ 181 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ date          : Date, format: "2020-03-24" "2020-03-25" "2020-03-26" "2020-03-27" ...
 $ day           : num  0 1 2 3 4 5 6 7 8 9 ...
 $ total_cases   : int  74 84 94 110 110 120 170 174 237 273 ...
 $ new_cases     : int  34 10 10 16 0 10 50 4 63 36 ...
 $ total_deaths  : int  1 2 4 4 4 4 4 4 4 6 ...
 $ new_deaths    : int  0 1 2 0 0 0 0 0 0 2 ...

尝试拟合模型:

library(nlme)
dat <- readRDS("dat.rds")

# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
  f <- d*exp(-((x - mu)^2) / (2*var))
  return(f)
}

# NLME Model
baseModel <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                  data = dat, 
                  fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                  random = d + mu + var ~ 1|Country.Region,
                  start = list(fixed = c(1000, 20, 20)),
                  na.action = na.omit)

但是，这是由此产生的错误:

Error in nlme.formula(new_cases ~ bellcurve.model(d, mu, var, x = day), : Singularity in backsolve at level 0, block 1

我已经阅读了其他 StackOverflow 帖子，表明模型中可能会混淆某些协变量，但我在这里使用的唯一协变量来预测 new_cases是 day .任何关于这意味着什么的建议或调试技巧将不胜感激，尤其是关于如何解决这个问题的任何建议。

最佳答案

tl;dr:这种类型的建模对起始值很敏感。我详细介绍了如何成功解决问题。有助于此修复的三个主要事项是:

增加最大迭代次数和函数调用

获取知情的起始值

可能限制/子集数据，因为最初添加的国家/地区的可用信息较少

简要概述:

我开始只是盲目地改变起始值。换 c(1000, 20, 20)至 c(20000, 10, 10)我让模型以 baseModel.naive 的形式免费运行错误/警告与 Log-likelihood: -23451.37 .使用增加的最大值进行迭代和函数调用并获取知情的起始值使模型能够通过 baseModel.bellcurve 大幅改进。举报 Log-likelihood: -20487.86 .

天真地改变起始值将摆脱错误

我调整了起始值。此外，我还展示了如何增加函数调用和迭代的最大数量并使用详细选项。可以使用 ?nlme 找到更多信息和 ?nlmeControl .根据我的经验，这些类型的模型可能对起始值、最大迭代次数和函数调用很敏感。

baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                  data = dat, 
                  fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                  random = d + mu + var ~ 1|Country.Region,
                  start = list(fixed = c(20000, 10, 10)),
                  na.action = na.omit,
                  control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
baseModel.naive

输出:

> baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
+                   data = dat, 
+                   fixed = list(d ~ 1, mu ~ 1, var ~ 1),
+                   random = d + mu + var ~ 1|Country.Region,
+                   start = list(fixed = c(20000, 10, 10)),
+                   na.action = na.omit,
+                   control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
  0:     30455.774:  2.23045  10.6880  8.80935 -11177.6  3277.46 -1877.96
  1:     30450.763:  2.80826  11.2726  9.37889 -11177.6  3277.46 -1877.96
  2:     30449.789:  2.94771  11.5283  9.80691 -11177.6  3277.46 -1877.96
  3:     30449.150:  3.30454  11.8277  10.0329 -11177.6  3277.46 -1877.96
  4:     30448.727:  3.64209  12.2237  10.4571 -11177.6  3277.46 -1877.96
  5:     30448.540:  3.97875  12.5637  10.8217 -11177.6  3277.46 -1877.96
  6:     30448.445:  4.31754  12.9055  11.1826 -11177.6  3277.46 -1877.96
  7:     30448.397:  4.65727  13.2480  11.5420 -11177.6  3277.46 -1877.96
  8:     30448.372:  4.99746  13.5908  11.9008 -11177.6  3277.46 -1877.96
  9:     30448.360:  5.33789  13.9337  12.2591 -11177.6  3277.46 -1877.96
 10:     30448.354:  5.67844  14.2767  12.6173 -11177.6  3277.46 -1877.96
 11:     30448.351:  6.01905  14.6197  12.9753 -11177.6  3277.46 -1877.96
 12:     30448.349:  6.35969  14.9627  13.3333 -11177.6  3277.46 -1877.96
 13:     30448.349:  6.70035  15.3058  13.6913 -11177.6  3277.46 -1877.96
 14:     30448.348:  7.04102  15.6489  14.0493 -11177.6  3277.46 -1877.96
 15:     30448.348:  7.38170  15.9919  14.4073 -11177.6  3277.46 -1877.96
 16:     30448.348:  7.72237  16.3350  14.7652 -11177.6  3277.46 -1877.96
 17:     30448.348:  8.06305  16.6781  15.1232 -11177.6  3277.46 -1877.96
 18:     30448.348:  8.40373  17.0211  15.4811 -11177.6  3277.46 -1877.96
 19:     30448.348:  8.74441  17.3642  15.8391 -11177.6  3277.46 -1877.96
 20:     30448.348:  9.08508  17.7073  16.1971 -11177.6  3277.46 -1877.96
 21:     30448.348:  9.42576  18.0503  16.5550 -11177.6  3277.46 -1877.96
  0:     30108.384:  9.42573  18.0503  16.5550 -11183.6  3279.09 -1879.43
  1:     30108.384:  9.42568  18.0503  16.5550 -11183.6  3279.09 -1879.43
  2:     30108.384:  9.42544  18.0502  16.5548 -11183.6  3279.09 -1879.43
  0:     30108.056:  9.42539  18.0502  16.5548 -11168.0  3239.13 -1879.51
  1:     30108.056:  9.42526  18.0502  16.5549 -11168.0  3239.13 -1879.51
  0:     30108.055:  9.42523  18.0502  16.5549 -11150.0  3223.70 -1879.28
  1:     30108.055:  9.42523  18.0502  16.5549 -11150.0  3223.70 -1879.28

> baseModel.naive
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ bellcurve.model(d, mu, var, x = day) 
  Data: dat 
  Log-likelihood: -23451.37
  Fixed: list(d ~ 1, mu ~ 1, var ~ 1) 
         d         mu        var 
4290.47415   35.32178   38.44048 

Random effects:
 Formula: list(d ~ 1, mu ~ 1, var ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr   
d        1.402353e-01 d    mu
mu       2.518250e-05 0      
var      1.123208e-04 0    0 
Residual 1.738523e+03        

Number of Observations: 2641
Number of Groups: 126

但也许这还不够。见 Corr与 0 的？似乎有些不对劲。所以我从 Roland ( https://stackoverflow.com/a/27549369/2727349 ) 那里拿了一页，并决定获得一个类似于你的 bellcurve.model 的自启动功能。 .
我也尝试将数据子集为 Country.Region s 有最少天数以防万一。我在这里分节详细介绍了我的结果/代码，最后(在附录中)将它们放在一起以便快速复制和粘贴。

初步和数据探索

为了探索，我尝试将数据限制在天数最少的国家/地区。我选择了 45 天(5 个国家/地区)，然后慢慢增加到 1 天(完整数据集)。我用过 data.table计算和显示相关的东西。

library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)


# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
  f <- d*exp(-((x - mu)^2) / (2*var))
  return(f)
}


dim(dat)
head(dat)

## how many countries?
dat[,uniqueN(Country.Region)]

## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]


minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))

从 nls() 和数据中获取自动起始值

这是 Roland's answer in a related post发挥作用。我们需要一种自动化的方式来获得知情的起始值，一种方法是使用 nls()和自启动功能。你可以用你的 bellcurve.model 做一个自启动功能但我发现 SSbell这是相似的，并决定利用它。我跑 nls()与 SSbell并获取 SSbell 的起始值具有 ymax 的配方(对应于 d 的 bellcurve.model )和 xc (对应于 mu 中的 bellcurve.model )。对于 var bellcurve.model 中的起始值我首先计算每个 Country.Region的方差，然后选择较小的一个，在 20% 的瓷砖上解决，因为这适用于 minimum_days<-45和 minimum_days<-1 (完整数据)。

## use this approach https://stackoverflow.com/a/27549369/2727349
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)

## calculate a starting value for var:  the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))

##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start

再一次——我不得不稍微玩一下——但是如果你取经验方差的 20% 平铺 nlme调用 bellcurve.model代码将为 minimum_days <- 45 运行以及 minimum_days <- 1 (完整数据集)。计算我们的起始值并可能限制我们的数据集，我们拟合了两个 nlme型号:一台使用 SSbell和另一个 bellcurve.model .两者都将运行 minimum_days<-45并且只有 bellcurve.model将收敛于 minimum_days<-1 (完整数据集)。幸运的!

两个 nlme 调用:一个是 SSbell ，其他为 bellcurve.model

# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc), 
                         data = dat[N>=minimum_days], 
                         fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
                         random = ymax + a + b + xc ~ 1|Country.Region,
                         start = list(fixed = c(    coef(nls_starting_values) )),
                         na.action = na.omit,
                         control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)

# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc   for mu;
# NLME Model: using bellcurve.model and                    var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                            data = dat[N>=minimum_days], 
                            fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                            random = d + mu + var ~ 1|Country.Region,
                            start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
                            na.action = na.omit,
                            control=list(maxIter=1e6, 
msMaxIter = 1e6, 
msVerbose = TRUE)
)

比较输出

baseModel.ssbell
baseModel.bellcurve

显示 minimum_days<-45 的输出显示相似的拟合(看看可能性)。

## 5 countries DATA minimum_days <- 45
> baseModel.ssbell
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ SSbell(day, ymax, a, b, xc) 
  Data: dat[N >= minimum_days] 
  Log-likelihood: -2264.706
  Fixed: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1) 
         ymax             a             b            xc 
 1.026599e+03 -1.164625e-02 -1.626079e-04  1.288924e+01 

Random effects:
 Formula: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr                
ymax     1.731582e+03 ymax   a      b     
a        4.467475e-06 -0.999              
b        1.016493e-04 -0.998  0.999       
xc       3.528238e+00  1.000 -0.999 -0.999
Residual 8.045770e+02                     

Number of Observations: 278
Number of Groups: 5

## 5 countries DATA minimum_days <- 45
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ bellcurve.model(d, mu, var, x = day) 
  Data: dat[N >= minimum_days] 
  Log-likelihood: -2267.406
  Fixed: list(d ~ 1, mu ~ 1, var ~ 1) 
        d        mu       var 
965.81525  12.73549  58.22049 

Random effects:
 Formula: list(d ~ 1, mu ~ 1, var ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr       
d        1.633432e+03 d     mu   
mu       2.815230e+00  1.00      
var      5.582379e-03 -0.01 -0.01
Residual 8.127932e+02            

Number of Observations: 278
Number of Groups: 5

显示 baseModel.bellcurve 的输出为 minimum_days<-1 (完整数据集)显示了从 baseModel.naive 的改进我盲目地随意摆弄起始值，其唯一目的是摆脱错误和警告。

## FULL DATA minimum_days <- 1
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ bellcurve.model(d, mu, var, x = day) 
  Data: dat[N >= minimum_days] 
  Log-likelihood: -20487.86
  Fixed: list(d ~ 1, mu ~ 1, var ~ 1) 
         d         mu        var 
1044.52101   25.00288   81.79004 

Random effects:
 Formula: list(d ~ 1, mu ~ 1, var ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr       
d        4.285702e+03 d     mu   
mu       5.423452e+00 0.545      
var      3.008379e-03 0.028 0.050
Residual 4.985698e+02            

Number of Observations: 2641
Number of Groups: 126

Log-likelihood: -20487.86为 baseModel.bellcurve对比 Log-likelihood: -23451.37为 baseModel.naive baseModel.bellcurve 中的 Corr 矩阵看起来也更好。

附录:一揽子代码。



library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)


# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
  f <- d*exp(-((x - mu)^2) / (2*var))
  return(f)
}


dim(dat)
head(dat)

## how many countries?
dat[,uniqueN(Country.Region)]

## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]


minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))



## use this approach https://stackoverflow.com/a/27549369/2727349
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)

## calculate a starting value for var:  the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))

##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start




# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc), 
                         data = dat[N>=minimum_days], 
                         fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
                         random = ymax + a + b + xc ~ 1|Country.Region,
                         start = list(fixed = c(    coef(nls_starting_values) )),
                         na.action = na.omit,
                         control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)


# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc   for mu;
# NLME Model: using bellcurve.model and                    var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                            data = dat[N>=minimum_days], 
                            fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                            random = d + mu + var ~ 1|Country.Region,
                            start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
                            na.action = na.omit,
                            control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)


baseModel.ssbell
baseModel.bellcurve

关于r - R 中非线性混合效应的不稳定性(使用 nlme 包)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61048618/

文章推荐： javascript - 创建动态 JSON 表 HTML JS

文章推荐： Python - 循环遍历嵌套字典并获取值

文章推荐： julia - 在 Gadfly 中处理统计信息

文章推荐： html - 如果div溢出如何自动使用滚动条

python - Numpy argsort 不稳定性
最近，我一直在试验 np.argsort，我发现了一些奇怪的东西。如果你运行下面的代码，你会得到结果: In [0]: np.argsort([3]*16) Out[0]: array([ 0,
python - Pandas 的 rolling_mean 不稳定性
我正在从当前环境(Python 2.7.3 64 位，pandas 0.9)升级到新环境(Python 2.7.6，pandas 0.14.1)，我的一些回归测试失败了。我追踪到 pandas.sta

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - R 中非线性混合效应的不稳定性(使用 nlme 包)