gpt4 book ai didi

r - 如何解决引导回归中的 "number of items to replace is not a multiple of replacement length"错误?

转载 作者:行者123 更新时间:2023-12-05 01:16:39 29 4
gpt4 key购买 nike

我正在尝试使用 Andy Field 的教科书 Discovering Statistics Using R 中的代码进行自举回归模型。

我正在努力解释运行 boot() 函数时收到的错误消息。通过阅读其他论坛帖子,我了解到它告诉我两个对象之间的项目数量不平衡,但我不明白这在我的上下文中意味着什么以及如何解决它。

您可以下载我的数据here (Airbnb 列表上的公开数据集)并在下面找到我的代码和完整的错误消息。我使用因子虚拟变量和连续变量的混合作为预测变量。提前感谢您的帮助!

代码:

bootReg <- function (formula, data, i)
{
d <- data [i,]
fit <- lm(formula, data = d)
return(coef(fit))
}

bootResults <- boot(statistic = bootReg, formula = review_scores_rating ~ instant_bookable + cancellation_policy +
host_since_cat + host_location_cat + host_response_time +
host_is_superhost + host_listings_cat + property_type + room_type +
accommodates + bedrooms + beds + price + security_deposit +
cleaning_fee + extra_people + minimum_nights + amenityBreakfast +
amenityAC + amenityElevator + amenityKitchen + amenityHostGreeting +
amenitySmoking + amenityPets + amenityWifi + amenityTV,
data = listingsRating, R = 2000)

错误:

Error in t.star[r, ] <- res[[r]] : 
number of items to replace is not a multiple of replacement length
In addition: Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
restarting interrupted promise evaluation

最佳答案

问题

问题是你的因子变量。当您对数据的子集执行 lm() 时(在 boot::boot() 中反复执行),您只会获得存在的因子水平。然后每个系数绘制可以具有不同的长度。如果你这样做,这可以被复制

debug(boot)
set.seed(123)
bootResults <- boot(statistic = bootReg, formula = review_scores_rating ~ instant_bookable + cancellation_policy +
host_since_cat + host_location_cat + host_response_time +
host_is_superhost + host_listings_cat + property_type + room_type +
accommodates + bedrooms + beds + price + security_deposit +
cleaning_fee + extra_people + minimum_nights + amenityBreakfast +
amenityAC + amenityElevator + amenityKitchen + amenityHostGreeting +
amenitySmoking + amenityPets + amenityWifi + amenityTV,
data = listingsRating, R = 2)

这将允许您一次一行地通过函数调用。运行该行后

res <- if (ncpus > 1L && (have_mc || have_snow)) {
if (have_mc) {
parallel::mclapply(seq_len(RR), fn, mc.cores = ncpus)
}
else if (have_snow) {
list(...)
if (is.null(cl)) {
cl <- parallel::makePSOCKcluster(rep("localhost",
ncpus))
if (RNGkind()[1L] == "L'Ecuyer-CMRG")
parallel::clusterSetRNGStream(cl)
res <- parallel::parLapply(cl, seq_len(RR), fn)
parallel::stopCluster(cl)
res
}
else parallel::parLapply(cl, seq_len(RR), fn)
}
} else lapply(seq_len(RR), fn)

然后试试

setdiff(names(res[[1]]), names(res[[2]]))
# [1] "property_typeBarn" "property_typeNature lodge"

第一个子集中存在两个因子水平,第二个子集中不存在。这导致了您的问题。

解决方案

事先使用 model.matrix() 扩展您的因子(在 this Stack Overflow post 之后):

df2 <- model.matrix( ~ review_scores_rating + instant_bookable + cancellation_policy + 
host_since_cat + host_location_cat + host_response_time +
host_is_superhost + host_listings_cat + property_type + room_type +
accommodates + bedrooms + beds + price + security_deposit +
cleaning_fee + extra_people + minimum_nights + amenityBreakfast +
amenityAC + amenityElevator + amenityKitchen + amenityHostGreeting +
amenitySmoking + amenityPets + amenityWifi + amenityTV - 1, data = listingsRating)
undebug(boot)

set.seed(123)
bootResults <- boot(statistic = bootReg, formula = review_scores_rating ~ .,
data = as.data.frame(df2), R = 2)

(请注意,在整个过程中,我将 R 减少到 2 只是为了在调试期间更快地运行)。

关于r - 如何解决引导回归中的 "number of items to replace is not a multiple of replacement length"错误?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53023472/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com