gpt4 book ai didi

r - R 中的 Monte Carlo 模拟、Bootstrap 和回归

转载 作者:行者123 更新时间:2023-12-01 03:18:14 24 4
gpt4 key购买 nike

我已经使用 SAS 很长时间了,现在我想用 R 翻译我的代码。我需要帮助来执行以下操作:

  • 生成多个引导样本
  • 对每个样本运行线性回归模型
  • 通过复制样本将参数存储在新数据集中

  • 为了更清晰,我编辑了这段代码。
    我用了很多 for 循环,我知道并不总是推荐。这个过程很慢

    是否有代码/包(例如应用系列函数,“插入符号”包)可以使这个非常干净高效/快速,尤其是当 samplesize*bootsample > 10 百万

    任何帮助将非常感激。
    samplesize <- 200
    bootsize<- 500
    myseed <- 123

    #generating a fake dataset
    id=1:n
    set.seed(myseed)
    x <- rnorm(samplesize, 5, 5)
    y <- rnorm(samplesize, 2 + 0.4*x, 0.5)
    data <- data.frame(id, x, y)

    head(data)
    id x y
    1 1 2.197622 3.978454
    2 2 3.849113 4.195852
    3 3 12.793542 6.984844
    4 4 5.352542 4.412614
    5 5 5.646439 4.051405
    6 6 13.575325 7.192007

    # generate bootstrap samples

    bootstrap <- function(nbootsamples, data, seed) {
    bootdata <- data.frame() #to initialize it
    set.seed(seed)
    for (i in 1:nbootsamples) {
    replicate <- i
    bootstrapIndex <- sample(1:nrow(data), replace = TRUE)
    datatemp <- data[bootstrapIndex, ]
    tempall <- cbind(replicate, datatemp)
    bootdata <- rbind(bootdata, tempall)
    }
    return(bootdata)
    }
    bootdata <- bootstrap(nbootsamples=bootsize, data=data, seed=myseed)
    bootdata <- dplyr::arrange(bootdata, replicate, id)
    head(bootdata)
    #The data should look like this
    replicate id x y
    1 1 1 2.197622 3.978454
    2 1 3 12.793542 6.984844
    3 1 5 5.646439 4.051405
    4 1 9 1.565736 3.451748
    5 1 10 2.771690 3.081662
    6 1 10 2.771690 3.081662

    #Model-fitting and saving coefficient and means

    modelFitting <- function(y, x, data) {
    modeltemp <- glm(y ~ x,
    data = data,
    family = gaussian('identity'))
    Inty <- coef(modeltemp)["(Intercept)"]
    betaX <- coef(modeltemp)["x"]
    sdy <- sd(residuals.glm(modeltemp))
    data.frame(Inty, betaX, sdy, row.names = NULL)
    }

    saveParameters <- function(nbootsamples, data, seed) {
    parameters <- data.frame() #to initialize it
    for (i in 1:length(unique(data$replicate))) {
    replicate <- i
    datai <- data[ which(data$replicate==i),]
    datatemp <- modelFitting(y, x,data=datai)
    meandata <- data.frame(Pr_X=mean(datai$x))
    tempall <- cbind(replicate, datatemp, meandata)
    parameters <- rbind(parameters, tempall)
    }
    return(parameters)
    }
    parameters <- saveParameters(nbootsamples=bootsize, data=bootdata, seed=myseed)
    head(parameters)

    #Ultimately all I want is my final dataset to look like the following

    replicate Inty betaX sdy Pr_X
    1 1 2.135529 0.3851757 0.5162728 4.995836
    2 2 1.957152 0.4094682 0.5071635 4.835884
    3 3 2.044257 0.3989742 0.4734178 5.111185
    4 4 2.093452 0.3861861 0.4921470 4.741299
    5 5 2.017825 0.4037699 0.5240363 4.931793
    6 6 2.026952 0.3979731 0.4898346 5.502320

    最佳答案

    使用 caret 可以轻松实现重采样回归包裹。给定您的示例数据,通过广义线性模型运行 200 个引导样本的代码如下所示。

    library(caret)
    x = round(rnorm(200, 5, 5))
    y= rnorm(200, 2 + 0.4*x, 0.5)
    theData <- data.frame(id=1:200,x, y)
    # configure caret training parameters to 200 bootstrap samples
    fitControl <- trainControl(method = "boot",
    number = 200)
    fit <- train(y ~ x, method="glm",data=theData,
    trControl = fitControl)
    # print output object
    fit
    # print first 10 resamples
    fit$resample[1:10,]

    插入符号的输出如下所示:
    > fit
    Generalized Linear Model

    200 samples
    1 predictor

    No pre-processing
    Resampling: Bootstrapped (200 reps)
    Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...
    Resampling results:

    RMSE Rsquared MAE
    0.4739306 0.9438834 0.3772199

    > fit$resample[1:10,]
    RMSE Rsquared MAE Resample
    1 0.5069606 0.9520896 0.3872257 Resample001
    2 0.4636029 0.9460214 0.3711900 Resample002
    3 0.4446103 0.9549866 0.3435148 Resample003
    4 0.4464119 0.9443726 0.3636947 Resample004
    5 0.5193685 0.9191259 0.4010104 Resample005
    6 0.4995917 0.9451417 0.4044659 Resample006
    7 0.4347831 0.9494606 0.3383224 Resample007
    8 0.4725041 0.9483434 0.3716319 Resample008
    9 0.5295650 0.9458453 0.4241543 Resample009
    10 0.4796985 0.9514595 0.3927207 Resample010
    >

    有关如何使用插入符号的详细信息,包括生成的模型对象的内容(例如,访问各个模型,以便您可以使用 predict() 函数为模拟生成预测),请访问 caret GitHub site。 .

    Caret 还支持并行处理。有关如何使用插入符号进行并行处理的示例,请阅读 Improving Performance of Random Forest with caret::train() .

    此外,R 中通过 Monte Carlo 支持蒙特卡罗模拟。 R中的包。

    关于r - R 中的 Monte Carlo 模拟、Bootstrap 和回归,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47615125/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com