gpt4 book ai didi

r - R 中非线性混合效应的不稳定性(使用 nlme 包)

转载 作者:行者123 更新时间:2023-12-04 10:16:06 26 4
gpt4 key购买 nike

我正在尝试构建一个 nonlinear mixed effects model COVID-19 数据与来自不同国家的每日病例数拟合成钟形曲线(随机效应在国家层面)。

数据表太大,无法包含在帖子中,但这里是 的结构:

> str(dat)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2642 obs. of 7 variables:
$ Country.Region: Factor w/ 181 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
$ date : Date, format: "2020-03-24" "2020-03-25" "2020-03-26" "2020-03-27" ...
$ day : num 0 1 2 3 4 5 6 7 8 9 ...
$ total_cases : int 74 84 94 110 110 120 170 174 237 273 ...
$ new_cases : int 34 10 10 16 0 10 50 4 63 36 ...
$ total_deaths : int 1 2 4 4 4 4 4 4 4 6 ...
$ new_deaths : int 0 1 2 0 0 0 0 0 0 2 ...

尝试拟合模型:
library(nlme)
dat <- readRDS("dat.rds")

# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}

# NLME Model
baseModel <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat,
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(1000, 20, 20)),
na.action = na.omit)

但是,这是由此产生的错误:

Error in nlme.formula(new_cases ~ bellcurve.model(d, mu, var, x = day), : Singularity in backsolve at level 0, block 1



我已经阅读了其他 StackOverflow 帖子,表明模型中可能会混淆某些协变量,但我在这里使用的唯一协变量来预测 new_casesday .任何关于这意味着什么的建议或调试技巧将不胜感激,尤其是关于如何解决这个问题的任何建议。

最佳答案

tl;dr:这种类型的建模对起始值很敏感。我详细介绍了如何成功解决问题。有助于此修复的三个主要事项是:

  • 增加最大迭代次数和函数调用
  • 获取知情的起始值
  • 可能限制/子集数据,因为最初添加的国家/地区的可用信息较少

  • 简要概述:

    我开始只是盲目地改变起始值。换 c(1000, 20, 20)c(20000, 10, 10)我让模型以 baseModel.naive 的形式免费运行错误/警告与 Log-likelihood: -23451.37 .使用增加的最大值进行迭代和函数调用并获取知情的起始值使模型能够通过 baseModel.bellcurve 大幅改进。举报 Log-likelihood: -20487.86 .

    天真地改变起始值将摆脱错误

    我调整了起始值。此外,我还展示了如何增加函数调用和迭代的最大数量并使用详细选项。可以使用 ?nlme 找到更多信息和 ?nlmeControl .根据我的经验,这些类型的模型可能对起始值、最大迭代次数和函数调用很敏感。
    baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
    data = dat,
    fixed = list(d ~ 1, mu ~ 1, var ~ 1),
    random = d + mu + var ~ 1|Country.Region,
    start = list(fixed = c(20000, 10, 10)),
    na.action = na.omit,
    control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
    baseModel.naive

    输出:
    > baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
    + data = dat,
    + fixed = list(d ~ 1, mu ~ 1, var ~ 1),
    + random = d + mu + var ~ 1|Country.Region,
    + start = list(fixed = c(20000, 10, 10)),
    + na.action = na.omit,
    + control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
    0: 30455.774: 2.23045 10.6880 8.80935 -11177.6 3277.46 -1877.96
    1: 30450.763: 2.80826 11.2726 9.37889 -11177.6 3277.46 -1877.96
    2: 30449.789: 2.94771 11.5283 9.80691 -11177.6 3277.46 -1877.96
    3: 30449.150: 3.30454 11.8277 10.0329 -11177.6 3277.46 -1877.96
    4: 30448.727: 3.64209 12.2237 10.4571 -11177.6 3277.46 -1877.96
    5: 30448.540: 3.97875 12.5637 10.8217 -11177.6 3277.46 -1877.96
    6: 30448.445: 4.31754 12.9055 11.1826 -11177.6 3277.46 -1877.96
    7: 30448.397: 4.65727 13.2480 11.5420 -11177.6 3277.46 -1877.96
    8: 30448.372: 4.99746 13.5908 11.9008 -11177.6 3277.46 -1877.96
    9: 30448.360: 5.33789 13.9337 12.2591 -11177.6 3277.46 -1877.96
    10: 30448.354: 5.67844 14.2767 12.6173 -11177.6 3277.46 -1877.96
    11: 30448.351: 6.01905 14.6197 12.9753 -11177.6 3277.46 -1877.96
    12: 30448.349: 6.35969 14.9627 13.3333 -11177.6 3277.46 -1877.96
    13: 30448.349: 6.70035 15.3058 13.6913 -11177.6 3277.46 -1877.96
    14: 30448.348: 7.04102 15.6489 14.0493 -11177.6 3277.46 -1877.96
    15: 30448.348: 7.38170 15.9919 14.4073 -11177.6 3277.46 -1877.96
    16: 30448.348: 7.72237 16.3350 14.7652 -11177.6 3277.46 -1877.96
    17: 30448.348: 8.06305 16.6781 15.1232 -11177.6 3277.46 -1877.96
    18: 30448.348: 8.40373 17.0211 15.4811 -11177.6 3277.46 -1877.96
    19: 30448.348: 8.74441 17.3642 15.8391 -11177.6 3277.46 -1877.96
    20: 30448.348: 9.08508 17.7073 16.1971 -11177.6 3277.46 -1877.96
    21: 30448.348: 9.42576 18.0503 16.5550 -11177.6 3277.46 -1877.96
    0: 30108.384: 9.42573 18.0503 16.5550 -11183.6 3279.09 -1879.43
    1: 30108.384: 9.42568 18.0503 16.5550 -11183.6 3279.09 -1879.43
    2: 30108.384: 9.42544 18.0502 16.5548 -11183.6 3279.09 -1879.43
    0: 30108.056: 9.42539 18.0502 16.5548 -11168.0 3239.13 -1879.51
    1: 30108.056: 9.42526 18.0502 16.5549 -11168.0 3239.13 -1879.51
    0: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
    1: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
    > baseModel.naive
    Nonlinear mixed-effects model fit by maximum likelihood
    Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
    Data: dat
    Log-likelihood: -23451.37
    Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
    d mu var
    4290.47415 35.32178 38.44048

    Random effects:
    Formula: list(d ~ 1, mu ~ 1, var ~ 1)
    Level: Country.Region
    Structure: General positive-definite, Log-Cholesky parametrization
    StdDev Corr
    d 1.402353e-01 d mu
    mu 2.518250e-05 0
    var 1.123208e-04 0 0
    Residual 1.738523e+03

    Number of Observations: 2641
    Number of Groups: 126

    但也许这还不够。见 Corr与 0 的?似乎有些不对劲。所以我从 Roland ( https://stackoverflow.com/a/27549369/2727349 ) 那里拿了一页,并决定获得一个类似于你的 bellcurve.model 的自启动功能。 .
    我也尝试将数据子集为 Country.Region s 有最少天数以防万一。我在这里分节详细介绍了我的结果/代码,最后(在附录中)将它们放在一起以便快速复制和粘贴。

    初步和数据探索

    为了探索,我尝试将数据限制在天数最少的国家/地区。我选择了 45 天(5 个国家/地区),然后慢慢增加到 1 天(完整数据集)。我用过 data.table计算和显示相关的东西。
    library(nlme)
    library(nlraa)
    library(data.table)
    dat <- readRDS("dat.rds")
    str(dat)
    setDT(dat)


    # Bell curve function defined by three parameters
    bellcurve.model <- function(d, mu, var, x) {
    f <- d*exp(-((x - mu)^2) / (2*var))
    return(f)
    }


    dim(dat)
    head(dat)

    ## how many countries?
    dat[,uniqueN(Country.Region)]

    ## number of days/rows per country.region:
    print(dat[,.N,by=Country.Region][order(N)], nrows=200)
    dat[,N:=.N, by=Country.Region]


    minimum_days <- 45
    ## model restricted to following countries/regions:
    data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))

    从 nls() 和数据中获取自动起始值

    这是 Roland's answer in a related post发挥作用。我们需要一种自动化的方式来获得知情的起始值,一种方法是使用 nls()和自启动功能。你可以用你的 bellcurve.model 做一个自启动功能但我发现 SSbell这是相似的,并决定利用它。我跑 nls()SSbell并获取 SSbell 的起始值具有 ymax 的配方(对应于 dbellcurve.model )和 xc (对应于 mu 中的 bellcurve.model )。对于 var bellcurve.model 中的起始值我首先计算每个 Country.Region的方差,然后选择较小的一个,在 20% 的瓷砖上解决,因为这适用于 minimum_days<-45minimum_days<-1 (完整数据)。
    ## use this approach https://stackoverflow.com/a/27549369/2727349
    ## but instead of SSlogis
    ## use nlraa::SSbell
    ## we can get starting values for d and mu.
    plot(new_cases~day, data=dat[N>=minimum_days])
    nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
    summary(nls_starting_values)

    ## calculate a starting value for var: the median of Country.Region specific variances
    variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
    range(variances)
    quantile(variances, seq(0,1,0.05))

    ##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
    ##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
    var.start <- quantile(variances, 0.20)
    var.start

    再一次——我不得不稍微玩一下——但是如果你取经验方差的 20% 平铺 nlme调用 bellcurve.model代码将为 minimum_days <- 45 运行以及 minimum_days <- 1 (完整数据集)。计算我们的起始值并可能限制我们的数据集,我们拟合了两个 nlme型号:一台使用 SSbell和另一个 bellcurve.model .两者都将运行 minimum_days<-45并且只有 bellcurve.model将收敛于 minimum_days<-1 (完整数据集)。幸运的!

    两个 nlme 调用:一个是 SSbell ,其他为 bellcurve.model
    # NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
    baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
    data = dat[N>=minimum_days],
    fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
    random = ymax + a + b + xc ~ 1|Country.Region,
    start = list(fixed = c( coef(nls_starting_values) )),
    na.action = na.omit,
    control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
    )
    # NLME Model: using bellcurve.model and nls_starting values for ymax for d;
    # NLME Model: using bellcurve.model and nls_starting values for xc for mu;
    # NLME Model: using bellcurve.model and var.start for var
    baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
    data = dat[N>=minimum_days],
    fixed = list(d ~ 1, mu ~ 1, var ~ 1),
    random = d + mu + var ~ 1|Country.Region,
    start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
    na.action = na.omit,
    control=list(maxIter=1e6,
    msMaxIter = 1e6,
    msVerbose = TRUE)
    )

    比较输出
    baseModel.ssbell
    baseModel.bellcurve

    显示 minimum_days<-45 的输出显示相似的拟合(看看可能性)。
    ## 5 countries DATA minimum_days <- 45
    > baseModel.ssbell
    Nonlinear mixed-effects model fit by maximum likelihood
    Model: new_cases ~ SSbell(day, ymax, a, b, xc)
    Data: dat[N >= minimum_days]
    Log-likelihood: -2264.706
    Fixed: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
    ymax a b xc
    1.026599e+03 -1.164625e-02 -1.626079e-04 1.288924e+01

    Random effects:
    Formula: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
    Level: Country.Region
    Structure: General positive-definite, Log-Cholesky parametrization
    StdDev Corr
    ymax 1.731582e+03 ymax a b
    a 4.467475e-06 -0.999
    b 1.016493e-04 -0.998 0.999
    xc 3.528238e+00 1.000 -0.999 -0.999
    Residual 8.045770e+02

    Number of Observations: 278
    Number of Groups: 5
    ## 5 countries DATA minimum_days <- 45
    > baseModel.bellcurve
    Nonlinear mixed-effects model fit by maximum likelihood
    Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
    Data: dat[N >= minimum_days]
    Log-likelihood: -2267.406
    Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
    d mu var
    965.81525 12.73549 58.22049

    Random effects:
    Formula: list(d ~ 1, mu ~ 1, var ~ 1)
    Level: Country.Region
    Structure: General positive-definite, Log-Cholesky parametrization
    StdDev Corr
    d 1.633432e+03 d mu
    mu 2.815230e+00 1.00
    var 5.582379e-03 -0.01 -0.01
    Residual 8.127932e+02

    Number of Observations: 278
    Number of Groups: 5

    显示 baseModel.bellcurve 的输出为 minimum_days<-1 (完整数据集)显示了从 baseModel.naive 的改进我盲目地随意摆弄起始值,其唯一目的是摆脱错误和警告。
    ## FULL DATA minimum_days <- 1
    > baseModel.bellcurve
    Nonlinear mixed-effects model fit by maximum likelihood
    Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
    Data: dat[N >= minimum_days]
    Log-likelihood: -20487.86
    Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
    d mu var
    1044.52101 25.00288 81.79004

    Random effects:
    Formula: list(d ~ 1, mu ~ 1, var ~ 1)
    Level: Country.Region
    Structure: General positive-definite, Log-Cholesky parametrization
    StdDev Corr
    d 4.285702e+03 d mu
    mu 5.423452e+00 0.545
    var 3.008379e-03 0.028 0.050
    Residual 4.985698e+02

    Number of Observations: 2641
    Number of Groups: 126
    Log-likelihood: -20487.86baseModel.bellcurve对比 Log-likelihood: -23451.37baseModel.naive baseModel.bellcurve 中的 Corr 矩阵看起来也更好。

    附录:一揽子代码。


    library(nlme)
    library(nlraa)
    library(data.table)
    dat <- readRDS("dat.rds")
    str(dat)
    setDT(dat)


    # Bell curve function defined by three parameters
    bellcurve.model <- function(d, mu, var, x) {
    f <- d*exp(-((x - mu)^2) / (2*var))
    return(f)
    }


    dim(dat)
    head(dat)

    ## how many countries?
    dat[,uniqueN(Country.Region)]

    ## number of days/rows per country.region:
    print(dat[,.N,by=Country.Region][order(N)], nrows=200)
    dat[,N:=.N, by=Country.Region]


    minimum_days <- 45
    ## model restricted to following countries/regions:
    data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))



    ## use this approach https://stackoverflow.com/a/27549369/2727349
    ## but instead of SSlogis
    ## use nlraa::SSbell
    ## we can get starting values for d and mu.
    plot(new_cases~day, data=dat[N>=minimum_days])
    nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
    summary(nls_starting_values)

    ## calculate a starting value for var: the median of Country.Region specific variances
    variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
    range(variances)
    quantile(variances, seq(0,1,0.05))

    ##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
    ##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
    var.start <- quantile(variances, 0.20)
    var.start




    # NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
    baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
    data = dat[N>=minimum_days],
    fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
    random = ymax + a + b + xc ~ 1|Country.Region,
    start = list(fixed = c( coef(nls_starting_values) )),
    na.action = na.omit,
    control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
    )


    # NLME Model: using bellcurve.model and nls_starting values for ymax for d;
    # NLME Model: using bellcurve.model and nls_starting values for xc for mu;
    # NLME Model: using bellcurve.model and var.start for var
    baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
    data = dat[N>=minimum_days],
    fixed = list(d ~ 1, mu ~ 1, var ~ 1),
    random = d + mu + var ~ 1|Country.Region,
    start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
    na.action = na.omit,
    control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
    )


    baseModel.ssbell
    baseModel.bellcurve

    关于r - R 中非线性混合效应的不稳定性(使用 nlme 包),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61048618/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com