gpt4 book ai didi

r - 为每个引导样本选择具有最大概率的类

转载 作者:行者123 更新时间:2023-11-30 09:14:38 24 4
gpt4 key购买 nike

我正在尝试运行一个查询,创建一个for循环,用于使用包rattle.data中的数据(以RainTomorrow作为目标列的天气数据)创建 Bootstrap 。我试图为每个引导样本选择一个具有最大概率的类别,然后预测具有最大票数的类别。

使用此代码我不断收到错误

if(!require(rpart)) install.packages("rpart") 
if(!require(rpart.plot)) install.packages("rpart.plot")
if(!require(caret)) install.packages("caret")
if(!require(rattle.data)) install.packages("rattle.data")
if(!require(tidyverse)) install.packages("tidyverse")
if(!require(ipred)) install.packages("ipred")
if(!require(Metrics)) install.packages("Metrics")
library(rpart)
library(rpart.plot)
library(rattle.data)
library(tidyverse)
library(caret)
library(ipred)
library(Metrics)

set.seed(500)

data <- weather

# creating train and test data
index <- createDataPartition(data$RainTomorrow, p = .6, list = FALSE)
train_data <- data[ index, ]
test_data <- data[-index, ]

## b ukol -> error in for each loop
nBoot = 10 #nr bootstrap samples

#create empty matrix [nr test data x nr bootstrap samples]to store bootstrap predictions
pred = matrix(data = NA, nrow = nrow(test_data), ncol = nBoot)

train_controls = rpart.control(minsplit = 6, maxdepth = 3)

for(b in 1:nBoot){
#create bootstrap sample
index.boot = sample(x=nrow(train_data), replace = T, size = nrow(train_data))
data_boot = train_data[index.boot,]
#fit data for the bootstrap sample
boot.model = rpart(RainTomorrow ~ .,
data =data_boot,
method = "anova",
control = train_controls)
#rpart.plot(boot.model)
#save prediction for bootstrap
pred[,b] = predict(boot.model, newdata= test_data )
}

#calculate prediction as mean of bootstrap predictions

pred.bagged = rowMeans(pred)
print(rmse(actual = test_data$RainTomorrow, predicted = pred.bagged))

但是运行此查询会返回一条警告消息:

In Ops.factor(actual, predicted) : ‘-’ not meaningful for factors

我一生都无法弄清楚原因(机器学习的新手)。

编辑:仍在寻找有效的答案

最佳答案

发生错误是因为您尝试根据一个因子计算 RMSE:

pred.bagged = rowMeans(pred)
class(pred.bagged)
[1] "numeric"
class(test_data$RainTomorrow)
[1] "factor"

您可以将因子转换为数字,这就是 rpart 在指定 method = "anova"时所做的操作,并计算 RMSE:

rmse(actual = as.numeric(test_data$RainTomorrow), predicted = pred.bagged)

RMSE 通常用于回归,对于分类模型没有多大意义。对于分类,您可以使用 method="class"并使用 f1 或 cohen's kappa 来评估使用准确性,您可以使用插入符号中的 fusionMatrix 查看下面的示例:

for(b in 1:nBoot){
#create bootstrap sample
index.boot = sample(x=nrow(train_data), replace = T)
data_boot = train_data[index.boot,]
#fit data for the bootstrap sample
boot.model = rpart(RainTomorrow ~ .,
data =data_boot,
method = "class",
control = train_controls)
#rpart.plot(boot.model)
#save prediction for bootstrap
pred[,b] = as.character(predict(boot.model, newdata= test_data ,type="class"))
}

# very crude way to get majority vote
pred.bagged = apply(pred,1,function(i){
names(sort(table(factor(i,levels=c("No","Yes")))))[2]
})
# convert to a factor, same levels as test_data$RainTomorrow
pred.bagged = factor(pred.bagged,levels=c("No","Yes"))

confusionMatrix(,test_data$RainTomorrow)
Confusion Matrix and Statistics

Reference
Prediction No Yes
No 120 0
Yes 0 26

Accuracy : 1
95% CI : (0.9751, 1)
No Information Rate : 0.8219
P-Value [Acc > NIR] : 3.672e-13

Kappa : 1

Mcnemar's Test P-Value : NA

Sensitivity : 1.0000
Specificity : 1.0000
Pos Pred Value : 1.0000
Neg Pred Value : 1.0000
Prevalence : 0.8219
Detection Rate : 0.8219
Detection Prevalence : 0.8219
Balanced Accuracy : 1.0000

'Positive' Class : No

关于r - 为每个引导样本选择具有最大概率的类,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58953542/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com