gpt4 book ai didi

r - 控制台和 Rmarkdown 的准确度结果不同

转载 作者:行者123 更新时间:2023-12-04 11:39:33 27 4
gpt4 key购买 nike

我有多个准确度不同的分类机器学习模型。当我在控制台中运行我的 xgBOOST(使用 library(caret))时,我得到了 0.7586 的准确度。但是当我编织我的 Rmarkdown 时,相同模型的准确率是 0.8621。我不知道为什么这是不同的。
我按照此链接的建议进行操作,但没有任何效果:https://community.rstudio.com/t/console-and-rmd-output-differ-same-program-used-but-the-calculation-gives-a-different-result/67873/3
我也遵循了问题的建议,但没有任何效果:Statistics Result in R Markdown is different from the Knit Output (All Format: Word, HTML, PDF)
最后我尝试了这个,但也没有任何效果:sample function gives different result in console and in knitted document when seed is set
这是我在控制台和 Rmarkdown 中运行的代码,但准确度不同:

    # Data
data <- data[!is.na(data$var1),]

# Change levels of var1
levels(data$var1)=c("No","Yes")

#Data Preparation and Preprocessing
# Create the training and test datasets
set.seed(100)

# Step 1: Get row numbers for the training data
trainRowNumbers <- createDataPartition(data$var1, p=0.8, list=FALSE)

# Step 2: Create the training dataset
trainset <- data[trainRowNumbers,]

# Step 3: Create the test dataset
testset <- data[-trainRowNumbers,]

# Store Y for later use.
y = trainset$var1

# Create the knn imputation model on the training data
preProcess_missingdata_model <- preProcess(as.data.frame(trainset), method= c("knnImpute"))
preProcess_missingdata_model

# Create the knn imputation model on the testset data
preProcess_missingdata_model_test <- preProcess(as.data.frame(testset), method = c("knnImpute"))
preProcess_missingdata_model_test

# Use the imputation model to predict the values of missing data points
library(RANN) # required for knnInpute
trainset <- predict(preProcess_missingdata_model, newdata = trainset)
anyNA(trainset)

# Use the imputation model to predict the values of missing data points
library(RANN) # required for knnInpute
testset <- predict(preProcess_missingdata_model_test, newdata = testset)
anyNA(testset)

# Append the Y variable
trainset$var1 <- y

# Run algorithms using 5-fold cross validation
control <- trainControl(method="cv",
number=5,
repeats = 5,
savePredictions = "final",
search = "grid",
classProbs = TRUE)
metric <- "Accuracy"

# Make Valid Column Names
colnames(trainset) <- make.names(colnames(trainset))
colnames(testset) <- make.names(colnames(testset))

# xgBOOST
set.seed(7)
fit.xgbDART <- train(var1~., data = trainset, method = "xgbTree", metric = metric, trControl = control, verbose = FALSE, tuneLength = 7, nthread = 1)

# estimate skill of xgBOOST on the testset dataset
predictions <- predict(fit.xgbDART, testset)
cm <- caret::confusionMatrix(predictions, testset$var1, mode='everything')
cm

My RNGKind is:

RNGkind()
[1] "L'Ecuyer-CMRG" "Inversion" "Rejection"

最佳答案

总是添加功能:

set.seed(544) 
此函数设置用于生成随机数序列的起始编号 - 如果每次运行相同的过程时都使用相同的种子开始,它可以确保获得相同的结果。例如,如果我在设置种子后立即使用 sample() 函数,我将始终获得相同的样本。

关于r - 控制台和 Rmarkdown 的准确度结果不同,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67737499/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com