gpt4 book ai didi

r - 简单问题(我认为) - 通过插入符号包在 KNN 中使用 F1 分数指标

转载 作者:行者123 更新时间:2023-12-04 12:36:41 24 4
gpt4 key购买 nike

我正在尝试使用 F1 分数来确定哪个 k 值可以最大化模型的给定目的。该模型是通过train制作的caret 中的函数包裹。

示例数据集:https://www.kaggle.com/lachster/churndata

我当前的代码包括以下内容(作为 f1 分数的函数):

f1 <- function(data, lev = NULL, model = NULL) {
precision <- posPredValue(data$pred, data$obs, positive = "pass")
recall <- sensitivity(data$pred, data$obs, positive = "pass")
f1_val <- (2*precision*recall) / (precision + recall)
names(f1_val) <- c("F1")
f1_val
}

以下为列车控制:
train.control <- trainControl(method = "repeatedcv", number = 10, repeats = 3, 
summaryFunction = f1, search = "grid")

以下是我对 train 的最终执行命令:
x <- train(CHURN ~. , 
data = experiment,
method = "knn",
tuneGrid = expand.grid(.k=1:30),
metric = "F1",
trControl = train.control)

请注意,该模型试图预测一组电信客户的流失率。

执行返回以下结果:
有什么不对;缺少所有 F1 指标值:
       F1     
Min. : NA
1st Qu.: NA
Median : NA
Mean :NaN
3rd Qu.: NA
Max. : NA
NA's :30
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.

任何帮助都是史诗般的!!

编辑:感谢 misuse 的帮助,我的代码现在看起来如下所示,但返回此错误
    levels(exp2$CHURN) <- make.names(levels(factor(exp2$CHURN)))

library(mlbench)

train.control <- trainControl(method = "repeatedcv", number = 10, repeats = 3,
summaryFunction = prSummary, classProbs = TRUE)

knn_fit <- train(CHURN ~., data = exp2, method = "knn", trControl =
train.control, preProcess = c("center", "scale"), tuneLength = 15, metric = "F")

错误:
Error in trainControl(method = "repeatedcv", number = 10, repeats = 3,  : 
object 'prSummary' not found

最佳答案

Caret 包含一个汇总函数:prSummary提供 F1 分数的完整示例:

library(caret)
library(mlbench)
data(Sonar)

train.control <- trainControl(method = "repeatedcv", number = 10, repeats = 3,
summaryFunction = prSummary, classProbs = TRUE)


knn_fit <- train(Class ~., data = Sonar, method = "knn",
trControl=train.control ,
preProcess = c("center", "scale"),
tuneLength = 15,
metric = "F")
knn_fit
#output
k-Nearest Neighbors

208 samples
60 predictor
2 classes: 'M', 'R'

Pre-processing: centered (60), scaled (60)
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 187, 188, 187, 188, 187, 187, ...
Resampling results across tuning parameters:

k AUC Precision Recall F
5 0.3582687 0.7936713 0.9065657 0.8414592
7 0.4985709 0.7758271 0.8883838 0.8239438
9 0.6632328 0.7484092 0.8853535 0.8089210
11 0.7426320 0.7151175 0.8676768 0.7814297
13 0.7388742 0.6883105 0.8646465 0.7641392
15 0.7594436 0.6787983 0.8467172 0.7520524
17 0.7583071 0.6909693 0.8527778 0.7616448
19 0.7702208 0.6913001 0.8585859 0.7644433
21 0.7642698 0.6962528 0.8707071 0.7719442
23 0.7652370 0.6945755 0.8707071 0.7696863
25 0.7606508 0.6929364 0.8707071 0.7683987
27 0.7454728 0.6916762 0.8676768 0.7669464
29 0.7551679 0.6900416 0.8707071 0.7676640
31 0.7603099 0.6935720 0.8828283 0.7749490
33 0.7614621 0.6938805 0.8770202 0.7728923

F was used to select the optimal model using the largest value.
The final value used for the model was k = 5.

关于r - 简单问题(我认为) - 通过插入符号包在 KNN 中使用 F1 分数指标,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49466590/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com