r - sbf() 是否使用度量参数来优化模型？-6ren

r - sbf() 是否使用度量参数来优化模型？

转载作者：行者123 更新时间：2023-11-30 08:54:27

通过ROC如metric caretSBF 的参数值功能

我们的目标是在运行“按过滤选择”时使用 ROC 摘要指标进行模型选择 sbf()用于特征选择的函数。

BreastCancer数据集用作 mlbench 中的可重现示例要运行的包 train()和sbf()与 metric = "Accuracy"和metric = "ROC"

我们想确保 sbf()取metric train() 应用的参数和rfe()函数来优化模型。为此，我们计划利用train()功能与 sbf()功能。 caretSBF$fit函数调用train() ，和caretSBF被传递到sbfControl 。

从输出来看，似乎 metric参数仅用于 inner resampling而不是sbf部分，即 outer resampling输出的metric参数未按 train() 使用的方式应用和rfe() .

正如我们所使用的 caretSBF它使用 train() ，看来 metric参数的范围限制为 train()因此不会传递给 sbf 。

我们希望能澄清是否 sbf()使用metric优化模型的参数，即 outer resampling ？

这是我们在可重现示例上的工作，显示 train()使用metric使用 Accuracy 进行论证和ROC ，但是对于 sbf我们不确定。

我。数据部分

  ## Loading required packages   
  library(mlbench)
  library(caret)

  ## Loading `BreastCancer` Dataset from *mlbench* package   
  data("BreastCancer")

  ## Data cleaning for missing values
  # Remove rows/observation with NA Values in any of the columns
  BrC1 <- BreastCancer[complete.cases(BreastCancer),] 

  # Removing Class and Id Column and keeping just Numeric Predictors
  Num_Pred <- BrC1[,2:10]

二．自定义汇总功能

定义 FiveStats 汇总函数

  fiveStats <- function(...) c(twoClassSummary(...),
                         defaultSummary(...))

三．火车部分

定义 trControl

  trCtrl <- trainControl(method="repeatedcv", number=10,
  repeats=1, classProbs = TRUE, summaryFunction = fiveStats)

训练+公制=“准确度”

   set.seed(1)
   TR_acc <- train(Num_Pred,BrC1$Class, method="rf",metric="Accuracy",
   trControl = trCtrl,tuneGrid=expand.grid(.mtry=c(2,3,4,5)))

   TR_acc
   # Random Forest 
   # 
   # 683 samples
   #   9 predictor
   #   2 classes: 'benign', 'malignant' 
   # 
   # No pre-processing
   # Resampling: Cross-Validated (10 fold, repeated 1 times) 
   # Summary of sample sizes: 615, 615, 614, 614, 614, 615, ... 
   # Resampling results across tuning parameters:
   # 
   #   mtry  ROC        Sens       Spec       Accuracy   Kappa    
   #   2     0.9936532  0.9729798  0.9833333  0.9765772  0.9490311
   #   3     0.9936544  0.9729293  0.9791667  0.9750853  0.9457534
   #   4     0.9929957  0.9684343  0.9750000  0.9706948  0.9361373
   #   5     0.9922907  0.9684343  0.9666667  0.9677536  0.9295782
   # 
   # Accuracy was used to select the optimal model using  the largest value.
   # The final value used for the model was mtry = 2.

训练 + 公制 =“ROC”

   set.seed(1)
   TR_roc <- train(Num_Pred,BrC1$Class, method="rf",metric="ROC",
   trControl = trCtrl,tuneGrid=expand.grid(.mtry=c(2,3,4,5)))
   TR_roc
   # Random Forest 
   # 
   # 683 samples
   #   9 predictor
   #   2 classes: 'benign', 'malignant' 
   # 
   # No pre-processing
   # Resampling: Cross-Validated (10 fold, repeated 1 times) 
   # Summary of sample sizes: 615, 615, 614, 614, 614, 615, ... 
   # Resampling results across tuning parameters:
   # 
   #   mtry  ROC        Sens       Spec       Accuracy   Kappa    
   #   2     0.9936532  0.9729798  0.9833333  0.9765772  0.9490311
   #   3     0.9936544  0.9729293  0.9791667  0.9750853  0.9457534
   #   4     0.9929957  0.9684343  0.9750000  0.9706948  0.9361373
   #   5     0.9922907  0.9684343  0.9666667  0.9677536  0.9295782
   # 
   # ROC was used to select the optimal model using  the largest value.
   # The final value used for the model was mtry = 3.

四。编辑 caretSBF

编辑caretSBF摘要函数

   caretSBF$summary <- fiveStats

V. SBF 部分

定义 sbfControl

   sbfCtrl <- sbfControl(functions=caretSBF, 
   method="repeatedcv", number=10, repeats=1,
   verbose=T, saveDetails = T)

SBF + METRIC =“准确度”

   set.seed(1)
   sbf_acc <- sbf(Num_Pred, BrC1$Class,
   sbfControl = sbfCtrl,
   trControl = trCtrl, method="rf", metric="Accuracy")

   ## sbf_acc  
   sbf_acc

   # Selection By Filter
   # 
   # Outer resampling method: Cross-Validated (10 fold, repeated 1 times) 
   # 
   # Resampling performance:
   # 
   #     ROC  Sens   Spec Accuracy Kappa    ROCSD SensSD  SpecSD AccuracySD  KappaSD
   #  0.9931 0.973 0.9833   0.9766 0.949 0.006272 0.0231 0.02913    0.01226 0.02646
   # 
   # Using the training set, 9 variables were selected:
   #    Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size...
   # 
   # During resampling, the top 5 selected variables (out of a possible 9):
   #    Bare.nuclei (100%), Bl.cromatin (100%), Cell.shape (100%), Cell.size (100%), Cl.thickness (100%)
   # 
   # On average, 9 variables were selected (min = 9, max = 9)

   ## Class of sbf_acc
   class(sbf_acc)
   # [1] "sbf"

   ## Names of elements of sbf_acc
   names(sbf_acc)
   #  [1] "pred"         "variables"    "results"      "fit"          "optVariables"
   #  [6] "call"         "control"      "resample"     "metrics"      "times"       
   # [11] "resampledCM"  "obsLevels"    "dots"        

   ## sbf_acc fit element*  
   sbf_acc$fit
   # Random Forest 
   # 
   # 683 samples
   #   9 predictor
   #   2 classes: 'benign', 'malignant' 
   # 
   # No pre-processing
   # Resampling: Cross-Validated (10 fold, repeated 1 times) 
   # Summary of sample sizes: 615, 614, 614, 615, 615, 615, ... 
   # Resampling results across tuning parameters:
   # 
   #   mtry  ROC        Sens       Spec       Accuracy   Kappa    
   #   2     0.9933176  0.9706566  0.9833333  0.9751492  0.9460717
   #   5     0.9920034  0.9662121  0.9791667  0.9707801  0.9363708
   #   9     0.9914825  0.9684343  0.9708333  0.9693308  0.9327662
   # 
   # Accuracy was used to select the optimal model using  the largest value.
   # The final value used for the model was mtry = 2. 

   ##  Elements of sbf_acc fit  
   names(sbf_acc$fit)
   #  [1] "method"       "modelInfo"    "modelType"    "results"      "pred"        
   #  [6] "bestTune"     "call"         "dots"         "metric"       "control"     
   # [11] "finalModel"   "preProcess"   "trainingData" "resample"     "resampledCM" 
   # [16] "perfNames"    "maximize"     "yLimits"      "times"        "levels"      

   ## sbf_acc fit final Model
   sbf_acc$fit$finalModel

   # Call:
   #  randomForest(x = x, y = y, mtry = param$mtry) 
   #                Type of random forest: classification
   #                      Number of trees: 500
   # No. of variables tried at each split: 2
   # 
   #         OOB estimate of  error rate: 2.34%
   # Confusion matrix:
   #           benign malignant class.error
   # benign       431        13  0.02927928
   # malignant      3       236  0.01255230

   ## sbf_acc metric
   sbf_acc$fit$metric
   # [1] "Accuracy"

   ## sbf_acc fit best Tune*  
   sbf_acc$fit$bestTune
   #   mtry
   # 1    2

SBF + METRIC =“ROC”

   set.seed(1)
   sbf_roc <- sbf(Num_Pred, BrC1$Class,
   sbfControl = sbfCtrl,
   trControl = trCtrl, method="rf", metric="ROC")


   ## sbf_roc  
   sbf_roc

   # Selection By Filter
   # 
   # Outer resampling method: Cross-Validated (10 fold, repeated 1 times) 
   # 
   # Resampling performance:
   # 
   #     ROC  Sens   Spec Accuracy Kappa    ROCSD SensSD  SpecSD AccuracySD KappaSD
   #  0.9931 0.973 0.9833   0.9766 0.949 0.006272 0.0231 0.02913    0.01226 0.02646
   # 
   # Using the training set, 9 variables were selected:
   #    Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size...
   # 
   # During resampling, the top 5 selected variables (out of a possible 9):
   #    Bare.nuclei (100%), Bl.cromatin (100%), Cell.shape (100%), Cell.size (100%), Cl.thickness (100%)
   # 
   # On average, 9 variables were selected (min = 9, max = 9)

   ## Class of sbf_roc
   class(sbf_roc)
   # [1] "sbf"

   ## Names of elements of sbf_roc
   names(sbf_roc)
   #  [1] "pred"         "variables"    "results"      "fit"          "optVariables"
   #  [6] "call"         "control"      "resample"     "metrics"      "times"       
   # [11] "resampledCM"  "obsLevels"    "dots"        

   ## sbf_roc fit element*  
   sbf_roc$fit
   # Random Forest 
   # 
   # 683 samples
   #   9 predictor
   #   2 classes: 'benign', 'malignant' 
   # 
   # No pre-processing
   # Resampling: Cross-Validated (10 fold, repeated 1 times) 
   # Summary of sample sizes: 615, 614, 614, 615, 615, 615, ... 
   # Resampling results across tuning parameters:
   # 
   #   mtry  ROC        Sens       Spec       Accuracy   Kappa    
   #   2     0.9933176  0.9706566  0.9833333  0.9751492  0.9460717
   #   5     0.9920034  0.9662121  0.9791667  0.9707801  0.9363708
   #   9     0.9914825  0.9684343  0.9708333  0.9693308  0.9327662
   # 
   # ROC was used to select the optimal model using  the largest value.
   # The final value used for the model was mtry = 2. 

   ##  Elements of sbf_roc fit  
   names(sbf_roc$fit)
   #  [1] "method"       "modelInfo"    "modelType"    "results"      "pred"        
   #  [6] "bestTune"     "call"         "dots"         "metric"       "control"     
   # [11] "finalModel"   "preProcess"   "trainingData" "resample"      "resampledCM" 
   # [16] "perfNames"    "maximize"     "yLimits"      "times"        "levels"      

   ## sbf_roc fit final Model
   sbf_roc$fit$finalModel

   # Call:
   #  randomForest(x = x, y = y, mtry = param$mtry) 
   #                Type of random forest: classification
   #                      Number of trees: 500
   # No. of variables tried at each split: 2
   # 
   #         OOB estimate of  error rate: 2.34%
   # Confusion matrix:
   #           benign malignant class.error
   # benign       431        13  0.02927928
   # malignant      3       236  0.01255230

   ## sbf_roc metric
   sbf_roc$fit$metric
   # [1] "ROC"

   ## sbf_roc fit best Tune  
   sbf_roc$fit$bestTune
   #   mtry
   # 1    2

是sbf()使用metric优化模型的论点？如果是，什么 metric确实sbf()用作默认值？如果sbf()使用metric参数，那么如何将其设置为 ROC ？

谢谢。

最佳答案

sbf不使用指标来优化任何内容(与 rfe 不同)；所有sbf所做的就是在调用模型之前执行特征选择步骤。当然，您可以定义过滤器，但无法使用 sbf 来调整过滤器。因此不需要任何指标来指导该步骤。

使用sbf(x, y, metric = "ROC")将通过metric = "ROC"到您正在使用的任何建模函数(并且它设计为在使用 train 时与 caretSBF 一起使用。发生这种情况是因为 metric 没有 sbf 参数:

> names(formals(caret:::sbf.default))
[1] "x"          "y"          "sbfControl" "..."

关于r - sbf() 是否使用度量参数来优化模型？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39922359/

文章推荐： python - 带有 SVM 的 GridSearch 产生 IndexError

文章推荐： javascript - 如何在 .hide() 之后对剩余的表行重新编号

文章推荐： javascript - 按类名单击对象

文章推荐： python - 是否可以在spacy中独立进行词形还原？

c++ - 按位置查找未标记的模板选项/参数/参数
简而言之:我想从可变参数模板参数中提取各种选项，但不仅通过标签而且通过那些参数的索引，这些参数是未知的标签。我喜欢 boost 中的方法(例如 heap 或 lockfree 策略)，但想让它与 S
Excel IF 语句 3 参数参数
我可以对单元格中的 excel IF 语句提供一些帮助吗？它在做什么？对“BaselineAmount”进行了哪些评估？ =IF(BaselineAmount, (Variance/Baselin
c# - 有没有办法在异步方法中使用 out 参数？如果没有，谁能建议我如何从异步方法返回 OUT 参数？
我正在使用以下方法: public async Task Save(Foo foo,out int param) { ....... MySqlParameter prmparamID
delphi - 如何清除“运行”->“参数”菜单中的“参数”字段？
我正在使用 CodeGear RAD Studio IDE。为了使用命令行参数测试我的应用程序，我多次使用了“运行 -> 参数”菜单中的“参数”字段。但是每次我给它提供一个新值时，它都无法从“下拉
java - Integer.toString(参数) 或 toString(参数)
我已经为信用卡类编写了一些代码，粘贴在下面。我有一个接受上述变量的构造函数，并且正在研究一些方法将这些变量格式化为字符串，以便最终输出将类似于号码:1234 5678 9012 3456 截止日期:
MySql IN 参数 - 在存储过程中使用时，VarChar IN 参数 val 是否需要单引号？
MySql IN 参数 - 在存储过程中使用时，VarChar IN 参数 val 是否需要单引号？我已经像平常一样创建了经典 ASP 代码，但我没有更新该列。我需要引用 VarChar 参数吗？
javascript - 创建一个有两个参数的函数，参数 a 将是一个数组，参数 b 将在数组中查找一个元素
给出了下面的开始，但似乎不知道如何完成它。本质上，如果我调用 myTest([one, Two, Three], 2); 它应该返回元素 third。必须使用for循环来找到我的解决方案。 funct
c - long int 参数 != long int 参数
将 1113355579999 作为参数传递时，该值在函数内部变为 959050335。调用(main.c): printf("%d\n", FindCommonDigit(111335557999
java - 为什么修改了 ArrayList 参数，但没有修改 String 参数？
这个问题在这里已经有了答案: Is Java "pass-by-reference" or "pass-by-value"? (92 个回答) 关闭9年前。 public class StackOve
c - scanf(参数) == 1 vs 1 == scanf(参数) 没有区别吗？
我真的很困惑，当像 1 == scanf("%lg", &entry) 交换为 scanf("%lg", &entry) == 1 没有区别。我的实验书上说的是前者，而我觉得后者是可以理解的。 1 =
Delphi 中的 Windows API 参数 - 使用或不使用 @ 运算符传递 var 参数？
我正在尝试使用调用 SetupDiGetDeviceRegistryProperty 的函数使用德尔福 7。该调用来自示例函数 SetupEnumAvailableComPorts .它看起来像这样:
php - MySQL如何从年份(参数)、weekOfYear(参数)、时间(数据库)和dayofweek(数据库)创建时间戳？
我需要在现有项目上实现一些事件的显示。我无法更改数据库结构。在我的 Controller 中，我(从 ajax 请求)传递了一个时间戳，并且我需要显示之前的 8 个事件。因此，如果时间戳是(转换后)
ruby-on-rails - ||如何工作？ : @client = client. 查找(参数[:client_id] || 参数[:id])
rails 新手。按照多态关联的教程，我遇到了这个以在create 和destroy 中设置@client。 @client = Client.find(params[:client_id] || p
java - 无法通过 .bat 文件设置 JVM 参数/参数(Xmx 和 Xms)
通过将 VM 参数设置为 -Xmx1024m，我能够通过 Eclipse 运行 Java 程序-Xms256M。现在我想通过 Windows 中的 .bat 文件运行相同的 Java 程序 (jar)
c++ - 如何从 C++ 调用 Delphi DLL WideString 参数(包括 var 参数)
我有一个 Delphi DLL，它在被 Delphi 应用程序调用时工作并导出声明为的方法: Procedure ProduceOutput(request,inputs:widestring; va
amazon-web-services - AWS Proton 参数 - 阐明如何在 CF 模板中使用 schema.yaml 参数
浏览完文档和示例后，我还没有弄清楚 schema.yaml 文件中的参数到底用在哪里。在此处使用 AWS 代码示例:https://github.com/aws-samples/aws-proton
java - 错误代码[17041]；索引::1 处缺少 IN 或 OUT 参数；嵌套异常是 java.sql.SQLException: 在索引::1 处缺少 IN 或 OUT 参数
程序参数: procedure get_user_profile ( i_attuid in ras_user.attuid%type, i_data_group in data_g
SQL + IN + 参数
我有一个字符串作为参数传递给我的存储过程。 dim AgentString as String = " 'test1', 'test2', 'test3' " 我想在 IN 中使用该参数声明。 AND
java方法内部变量上没有 "this"参数
这个问题已经有答案了: When should I use "this" in a class? (17 个回答) 已关闭 6 年前。我运行了一些java代码，我看到了一些我不太明白的东西。为什么下
Javascript 参数
我输入 scroll(0,10,200,10);但是当它运行时，它会传递字符串“xxpos”或“yypos”，我确实在没有撇号的情况下尝试过，但它就是行不通。 scroll = function(xp

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

r - sbf() 是否使用度量参数来优化模型？