r - R 的 Caret 包。保留了哪些 sample ？-6ren

r - R 的 Caret 包。保留了哪些 sample ？

转载作者：行者123 更新时间：2023-12-04 06:20:47

24

4

我正在使用插入符包来尝试许多分类方法。目前我想使用离开组交叉验证方法(我知道有更好的方法)。

这是我正在使用的列车控制系统:

train_control <- trainControl(method = "LGOCV", p = .7, number = 1)

我的问题是当我通过火车功能应用它时，例如

model <- train(Type ~ ., data=training, method = "rpart", trControl = train_control)

如何访问用于训练的样本和组中保留的样本？

谢谢

最佳答案

让我们通过一个例子来看看:

首先，您需要在 trainControl 函数上指定另一个参数 returnResamp='all'，以便它返回所有重新采样的信息。

示例数据:

#classification example
y  <- rep(c(0,1), c(25,25))
x1 <- runif(50)
x2 <- runif(50)
df <- data.frame(y,x1,x2)

解决方案:

您的代码应该是这样的(我在下面使用 number=2 以便您可以看到它是如何工作的):

#notice the returnResamp argument set to 'all'
train_control <- trainControl(method = "LGOCV", p = .7, number = 2, returnResamp='all')

model <- train(y ~ ., data=df, method = "rpart", trControl = train_control)

现在为了访问重新采样，您需要执行以下操作:

> model$control$index
$Resample1
 [1]  1  3  4  5  6  7  9 10 12 13 15 16 17 18 20 21 22 23 26 27 28 29 30 34 35 36 37 38 39 40 41 42 43 44 45 46

$Resample2
 [1]  2  3  4  5  6  9 11 12 13 14 15 16 17 19 20 21 24 25 26 28 29 30 31 33 34 35 36 37 38 40 41 42 45 47 49 50

上面的数字显示了每次重新采样的训练集中保留的行号。显然其余的都是遗漏组。

要确认这一点(例如，对于 resample1):

> nrow(df[model$control$index$Resample1,])
[1] 36 #36 observations kept in training set
> 36/50
[1] 0.72 #36/50 = 0.72 is the number specified in p

要访问您执行此操作的行(再次以 resample1 为例):

> df[model$control$index$Resample1,]
   y           x1         x2
1  0 0.9706626355 0.90786863
3  0 0.5664755146 0.66014308
4  0 0.5540436453 0.95919639
5  0 0.1941235152 0.60869461
6  0 0.7966452301 0.64245296
7  0 0.1021302647 0.50045568
9  0 0.9963372331 0.86199347
10 0 0.0641849677 0.83714478
12 0 0.0007932109 0.83086593
13 0 0.7914607469 0.98313602
15 0 0.4176381815 0.26584837
16 0 0.8913181033 0.78030297
17 0 0.3896608590 0.40215619
18 0 0.6155101282 0.50859816
20 0 0.4252773556 0.73868264
21 0 0.9494552673 0.96442255
22 0 0.6675511154 0.35240024
23 0 0.6931768688 0.42016284
26 1 0.6049248914 0.85045559
27 1 0.8878736692 0.20937898
28 1 0.0881897225 0.49006904
29 1 0.3561574069 0.87316667
30 1 0.7379366003 0.57722477
34 1 0.0762609572 0.85021965
...
...
...

与减号相同，将为您提供重新采样时遗漏的观察结果:

> df[-model$control$index$Resample1,]
   y          x1         x2
2  0 0.495293215 0.16392350
8  0 0.057934150 0.90044716
11 0 0.794459804 0.46207494
14 0 0.268692204 0.80763156
19 0 0.515704584 0.82078298
24 0 0.031054236 0.40846695
25 0 0.218243275 0.40132438
31 1 0.694632679 0.36696466
32 1 0.002055724 0.99023235
33 1 0.584879035 0.37515622
....
....

关于r - R 的 Caret 包。保留了哪些 sample ？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28964502/

24

4

0

文章推荐： r - 第一个括号内的作业和完整的作业一样耗时？

文章推荐： docker - Kibana 5.5.1 位于 nginx 1.13 代理后面(dockerized)

ruby-on-rails - 使新哈希从 {"sample"=> "sample"} 到 { :sample => "sample"}
In condition, COLUMN = [:id, :tag_list, :price, :url, :Perweight, :Totalweight, :memo, :created_at,
python - 使用总体样本的分类器 : scaling the population and then sampling/scaling the sample/scaling the X_TRAIN split of the sample?
我正在构建一个逻辑回归分类器。我从一组 500.000 条记录开始，我只想使用其中的一个样本。你有什么建议: 1) 缩放总体然后采样2)缩放样本3) 仅缩放样本的 X_TRAIN 分割？为什么？
python - 值错误 : Input arrays should have the same number of samples as target arrays. Found 1600 input samples and 6400 target samples
我正在尝试进行 8 级分类。这是代码: import keras import numpy as np from keras.preprocessing.image import ImageDataG
python - 在Keras中创建 "sample by sample"模型
我想在 Keras 中创建一个可以“逐个样本”学习的模型；这种机器叫online learning ，一个逐个接收和拟合数据的模型。我的问题是:我怎样才能在 Keras 中做到这一点？是否可以通过在拟
php - Codeigniter:this->datatables->select(sample)->from(sample)->where()
请帮帮我。我无法正确使用我的数据表。我想做的是从表中选择并使用where函数。但我做不到。这是我的 Controller 代码 public function reporttable ()
opencv - 对于汽车检测，阴性 sample 的大小应与阳性 sample 的大小相同吗？
我将所有正样本的大小调整为相同的大小，因此负样本的大小也应与正样本的大小相同。最佳答案通常，通过对象检测，您可以在图像上滑动固定大小的搜索窗口，从而产生特征响应。然后，分类器将响应与经过训练的模型
python - "sample larger than population"in random.sample python
为自己创建一个简单的通行证生成器，我注意到如果我希望我的人口只有数字(0-9)，总共有 10 个选项，如果我希望我的长度超过 10，它不会使用更多的数字然后一次并返回“样本大于总体”错误。是否可以维
multidimensional-array - 批量标准化: fixed samples or different samples by dimension?
当我读到一篇论文“批量归一化:通过减少内部协变量偏移来加速深度网络训练”时，我想到了一些问题。论文中写道: Since m examples from training data can estim
python : How to use random sample when we don't need duplicates random sample
我的代码 import random MyList = [[1,2,3,4,5,6,7,8],[a,s,d,f,g,h,h],[q,w,e,r,t,y]] MyListRandom = [] rand
python - 值错误 : Sample larger than population selecting samples from graph
我正在尝试从图中随机选择 n 个样本。为此，我使用 random.sample 函数创建了一个名为 X 的列表，如下所示: X= random.sample(range(graph.ecount())
JMeter:在哪种情况下，我可以在响应断言中将 "Main sample"或 "Sub Sample"或同时用于文本响应
我想知道在哪种情况下我可以将“主样本”或“子样本”或同时用于“响应断言”中的“文本响应”。我用谷歌搜索，但尚未收到满意的答案。帮助表示赞赏。最佳答案根据JMeter帮助， This is fo
hadoop - Rumen 的 sample 输出或 Gridmix 的 sample 输入
我对使用 Hadoop 等大数据工具还很陌生。我想在 Yarn/或 Yarn Simulator 上执行公开可用的集群跟踪 ( https://github.com/google/cluster-da
android - 银河连结 : Sensor Sampling Rate becomes faster when sampling more Sensors
我正在尝试从 Samsung Galaxy Nexus(Android 4.0)中尽可能快地读出传感器值。为此，我使用不同的传感器和采样率做了一些实验，并发现了一个非常奇怪的行为。当我仅使用 Acc-
r - Sample.int(m, k) 中的错误 : cannot take a sample larger than the population
首先，我要说的是，我对机器学习、kmeans 和 r 相当陌生，这个项目是一种了解更多相关知识的方法，也是向我们的 CIO 展示这些数据的方法，以便我可以在开发新的帮助台系统。我有一个 60K 行的
python - Django 查询集上的 random.sample : How will sampling on querysets affect performance?
我试图从我的查询集中抽取一些记录来提高性能，例如: from random import sample from my_app import MyModel my_models = MyModel.o
c - : type_a sample; type_b *sample_b = (type_b *) ((void*) &sample); 中的无关(void *)
我正在阅读此主题:Typecasting variable with another typedef type_b *sample_b = (type_b *) ((void *) &sample);
bioinformatics - Snakemake 和 Pandas 语法 : Getting sample specific parameters from the sample table
首先，这可能是 Snakemake and pandas syntax 的副本.但是，我仍然很困惑，所以我想再解释一下。在 Snakemake 中，我加载了一个包含多列的示例表。其中一列称为“Rea
python - random.sample(sample,k) 和 itertools.combinations(p,r) 之间的区别
你好，我是 python 新手，刚刚开始编写基本的 python 脚本。我决定编写一个密码生成器程序。我遇到了 random.sample() 和 itertools.combinations() 函
javascript - 使用 module.exports = new Sample 与 module.exports = Sample 导出对象
假设一个文件有很多原型(prototype)和函数对象声明代码: function Sample() { ... } Sample.prototype.method1 = () => { ..
iphone - 如何将caf High quality(sample rate)改成caf Low quality(sample rate)
我正在使用 AVAudioRecorder。我以 44100 采样率以 caf 格式录制音频。就记录成功了。录制后，我想转换已录制的 caf 采样率为 11025 和 22050 的音频文件。是否可

首页

博学

6Ren·AI

商城

r - R 的 Caret 包。保留了哪些 sample ？