Is there a way to create a loop where I provide a function and dataframe and subsample it, and repeat the function with a subsample N times?(有没有一种方法可以创建一个循环，在这里我提供一个函数，并对它进行重新构造和子采样，然后用子采样重复该函数N次？)-6ren

Is there a way to create a loop where I provide a function and dataframe and subsample it, and repeat the function with a subsample N times?(有没有一种方法可以创建一个循环，在这里我提供一个函数，并对它进行重新构造和子采样，然后用子采样重复该函数N次？)

转载作者：bug小助手更新时间：2023-10-28 11:02:30

I am not sure what the correct word for this would be, so apologies for getting the terminology horribly wrong. Basically I have about 1000 datapoints, and I want to randomly subsample 100 data points 999 times and perform the same function (a generalised least squares model) on each subsample, and see how often the correlation would be significant.

我不确定用什么词来形容这件事才是正确的，所以很抱歉，我把术语弄错了。基本上，我有大约1000个数据点，我想随机对100个数据点进行999次子采样，并对每个子样本执行相同的函数(广义最小二乘模型)，看看相关性有多大。

I am also adding some more context, in case it helps. My data is in a data frame with various columns, and I am doing a comparing if there is a relationship between altitude and dichromatism, and whether the relationship between the two varies depending on whether dichromatism is measured using a spectrophotometer or human scoring. I also include latitude centroid of species range in these models, so the PGLS for each looks as follows:

我还添加了一些背景，以防有帮助。我的数据是在一个数据框架与各种列，我做了一个比较，如果有一个高度和二色性之间的关系，以及两者之间的关系是否不同，这取决于是否使用分光光度计或人类评分测量二色性。我还在这些模型中包括了物种范围的纬度质心，因此每个模型的PGLS如下所示：

PGLS_VO_Score <- gls(Colour_discriminability_Absolute ~ Altitude_Reported*Centroid.Abs, 
                          correlation = corPagel(1, phy = AvianTreeEdge, form = ~Species), 
                          data = VO_HumanScores_Merged, method = "ML")

PGLS_Human_Score <- gls(Human_Score ~ Altitude_Reported*Centroid.Abs, 
                        correlation = corPagel(1, phy = AvianTreeEdge, form = ~Species), 
                        data = VO_HumanScores_Merged, method = "ML")

And the data frame of VO_Human_Scores_Merged included a columnn for species names, for Human Scores, for spectrophotometer scores, altitude, latitude, and then some transformed values of those (log transformed, etc.) which I did to begin with in case I needed to to transform the data to meet the assumptions of the PGLS.

VO_Human_Scores_Merged的数据框包括一列物种名称、人类评分、分光光度计评分、海拔、纬度，然后是这些的一些转换值（对数转换等）。我一开始就这么做了，以防我需要转换数据，以满足PGLS的假设。

更多回答

优秀答案推荐

A pipeline sampling helps to view what can be done here:

管道采样有助于查看此处可以执行的操作：

myfun <- function(x) cor(x[[1]], x[[3]])
set.seed(42)
replicate(5, mtcars[sample(nrow(mtcars), 10),], simplify=FALSE) |>
  lapply(myfun)
# [[1]]
# [1] -0.8130999
# [[2]]
# [1] -0.8633841
# [[3]]
# [1] -0.7967049
# [[4]]
# [1] -0.901294
# [[5]]
# [1] -0.8761853

(My 5 is your 999, my 10 is your 100.)

(我的5是你的999，我的10是你的100)

The simplify=FALSE is required since otherwise replicate will reduce to a (nested) matrix, not what we want. My myfun is contrived, use whatever function you want.

simplify=是必需的，因为否则replicate将减少到（嵌套的）矩阵，而不是我们想要的。我的myfun是人为的，使用任何你想要的功能。

The (perhaps only) advantage to breaking it out into two (or more) steps in a pipeline is that if you want to go back to revisit the random sampling, it's much simpler if you save that random sampling. For example,

在管道中将其分成两个(或更多)步骤的好处(可能只有一个)是，如果您想要重新查看随机采样，则保存随机采样会简单得多。例如,

set.seed(42)
sampdat <- replicate(5, mtcars[sample(nrow(mtcars), 10),], simplify=FALSE)
lapply(sampdat, myfun)
# [[1]]
# [1] -0.8130999
# [[2]]
# [1] -0.8633841
# [[3]]
# [1] -0.7967049
# [[4]]
# [1] -0.901294
# [[5]]
# [1] -0.8761853

If you later realize you need to do something else with the sample data (another metric or whatever) and you don't (for time, memory, or convenience) want to have to rerun all of the other sample-aggregations, you can re-use sampdat.

如果您后来意识到需要对样本数据执行其他操作(另一个指标或其他指标)，并且您不想(为了时间、内存或便利性)必须重新运行所有其他样本聚合，则可以重用sampdat。

You can take a random sample from your datapoints using sample. Then you can run your function n times using replicate.
An example that takes a random sample of n=100 and computes the mean 10 times:

您可以使用Sample从您的数据点随机抽取样本。然后，您可以使用REPLICATE运行函数n次。下面是一个随机抽样n=100并计算平均值10次的示例：

> set.seed(1)
> datapoints <- runif(1000, max = 10000)
> result <- replicate(10, mean(sample(datapoints, 100)))
5194.298 5063.320 5064.992 4681.281 5008.011 4849.998 5320.206 5012.931 4900.636 4776.135

更多回答

Thank you for your comment. I think I did something wrong, and am not sure why, because every output I got was the exact same, which I do not believe is what is meant to happen. This is what I put in myfun <- function(PGLS_VO_Scores) cor(VO_HumanScores_Merged$Colour_discriminability_Absolute, VO_HumanScores_Merged$Altitude_Reported) BirdReplicationAttempt <- replicate(999, VO_HumanScores_Merged[sample(nrow(VO_HumanScores_Merged), 100),], simplify=FALSE) |> lapply(myfun)

谢谢你的评论。我想我做错了什么，我不确定为什么，因为我得到的每一个输出都是完全相同的，我不相信这是应该发生的。这是我在MyFun<-Function(PGLS_VO_SCORKS)cor(VO_HumanScores_Merged$Colour_discriminability_Absolute，VO_HumanSCORES_MERGE$ALTALITY_REPORTED)中放入的内容)BirdReplicationAttempt<-REPLICATE(999，VO_HumanScores_Merged[sample(nrow(VO_HumanScores_Merged)，100)，]，SIMPLICE=FALSE)|>lApply(MyFun)

I have added more context to the original query in case that helps in understanding where the error occurred

我向原始查询添加了更多上下文，以防有助于理解错误发生的位置

You write a function that accepts as its sole argument PGLS_VO_Scores but never use it, instead choosing to breach scope and grab data from something else entirely. The function is supposed to take sample data and do something with that sample data, not data that might (or might not) be in some calling environment.

您编写了一个函数，该函数接受PGLS_VO_SCORKS作为其唯一参数，但从不使用它，而是选择突破作用域并从完全不同的东西获取数据。该函数应该获取样本数据并对该样本数据执行某些操作，而不是可能(也可能不)在某个调用环境中的数据。

Try changing your function to myfun <- function(x) cor(x$Colour_discriminability_Absolute, x$Altitude_Reported) and rerun your replication.

尝试将您的函数更改为myFun<-Function(X)COR(x$COLUR_DIRECTABILY_ADVAL，x$ALIGHTAL_REPORTED)，然后重新运行复制。

Thanks, that seems to have worked. And just to confirm, the output of that, is that the p values of the correlation? Or the correlation itself?

谢谢，这似乎起作用了。为了确认一下，输出的是相关性的p值吗？或者相关性本身？

Thank you for your comment. I tried to do this using the PGLS function which I want to rerun, replacing that for the "mean". and replacing "datapoints" for my data set, so it reads as follows: replicate(999, PGLS_VO_Score(sample(VO_HumanScores_Merged, 100))), but I only got an error, as follows: Error in PGLS_VO_Score(sample(VO_HumanScores_Merged, 100) : could not find function "PGLS_VO_Score" Is there a way to resolve this so that it recognises the function which I used for the entire dataset as the function I want to apply to each subset?

谢谢你的评论。我尝试使用我想要重新运行的PGLS函数来实现这一点，将其替换为“Mean”。并为我的数据集替换“datapPoints”，因此它如下所示：REPLICATE(999，PGLS_VO_SCORE(SAMPLE(VO_HumanScores_Merded，100))，但我只收到了一个错误，如下所示：ERROR in PGLS_VO_SCORE(Sample(VO_HumanScores_Merge，100)：找不到函数“PGLS_VO_SCORE”有没有办法解决这个问题，使它将我对整个数据集使用的函数识别为我要应用于每个子集的函数？

文章推荐： css - 使按钮变为全 Angular ？

文章推荐： html - 如何获取任何 URL 或网页的 Google 缓存年龄？

php - for 循环 vs while 循环 vs foreach 循环 PHP
我是 PHP 新手。我一直在脚本中使用 for 循环、while 循环、foreach 循环。我想知道哪个性能更好？选择循环的标准是什么？当我们在另一个循环中循环时应该使用哪个？我一直想知道要
java - 编写 for 循环/while 循环？
我在高中的编程课上，我的作业是制作一个基本的小计和顶级计算器，但我在一家餐馆工作，所以制作一个只能让你在一种食物中读到。因此，我尝试让它能够接收多种食品并将它们添加到一个价格变量中。抱歉，如果某些代码
javascript - 为成分编写 while 循环/for 循环。
这是我正在学习的一本教科书。 var ingredients = ["eggs", "milk", "flour", "sugar", "baking soda", "baking powder",
Javascript 添加前导零适用于 while 循环，但不适用于 for 循环
我正在从字符串中提取数字并将其传递给函数。我想给它加 1，然后返回字符串，同时保留前导零。我可以使用 while 循环来完成此操作，但不能使用 for 循环。 for 循环只是跳过零。 var add
java - 程序适用于 for 循环，但不适用于 while 循环？
编辑:我已经在程序的输出中进行了编辑。该程序要求估计给定值 mu。用户给出一个值 mu，同时还提供了四个不等于 1 的不同数字(称为 w、x、y、z)。然后，程序尝试使用 de Jaeger 公式找
Java For 循环 vs While 循环，奇怪的行为和时间性能
我正在编写一个算法，该算法对一个整数数组从末尾到开头执行一个大循环，其中包含一个 if 条件。第一次条件为假时，循环可以终止。因此，对于 for 循环，如果条件为假，它会继续迭代并进行简单的变量更改
java - While 循环 vs For 循环，哪个更节省内存!
现在我已经习惯了在内存非常有限的情况下进行编程，但我没有答案的一个问题是:哪个内存效率更高；- for(;;) 或 while() ？还是它们可以平等互换？如果有的话，还要对效率问题发表评论! 最佳答
java - 一个 while 循环，其中包含一个 if 语句和一个 for 循环
这个问题已经有答案了: How do I compare strings in Java? (23 个回答) 已关闭 8 年前。我正在尝试创建一个小程序，我可以在其中读取该程序的单词。如果单词有 6
python - 弹出索引超出范围 - 作业(列表，for 循环，while 循环)
这个问题在这里已经有了答案: python : list index out of range error while iteratively popping elements (12 个答案) 关
java - JOptionPane.showInputDialog 循环(使用 do while 循环)
我正在尝试向用户请求 4 到 10 之间的整数。如果他们回答超出该范围，它将进入循环。当用户第一次正确输入数字时，它不会中断并继续执行 else 语句。如果用户在 else 语句中正确输入数字，它将正
php - 嵌套的 foreach 循环，break inside 循环
我尝试创建一个带有嵌套 foreach 循环的列表。第一个循环是循环一些数字，第二个循环是循环日期。我想给一个日期写一个数字。所以还有另一个功能来检查它。但结果是数字多次写入日期。 Out 是这样的:
java - 在 while 循环(或 for 循环)内创建一个数组，然后在外部使用该数组
我想要做的事情是使用循环创建一个数组，然后在另一个类中调用该数组，这不会做，也可能永远不会做。解决这个问题最好的方法是什么？我已经寻找了所有解决方案，但它们无法编译。感谢您的帮助。 import ja
php - 嵌套的 foreach 循环，break inside 循环
我尝试创建一个带有嵌套 foreach 循环的列表。第一个循环是循环一些数字，第二个循环是循环日期。我想给一个日期写一个数字。所以还有另一个功能来检查它。但结果是数字多次写入日期。 Out 是这样的:
c - 如何将 'convert' 两个(for 循环)转为一个(while 循环)？
我正在模拟一家快餐店三个多小时。这三个小时分为 18 个间隔，每个间隔 600 秒。每个间隔都会输出有关这 600 秒内发生的情况的统计信息。我原来的结构是这样的: int i; for (i=0;
javascript - ie javascript for in 循环 vs chrome for in 循环
这个问题已经有答案了: IE8 for...in enumerator (3 个回答) How do I check if an object has a specific property in J
java - 编程语言中的 for 循环 VS while 循环，c++/java？
哪个对性能更好？这可能与其他编程语言不一致，所以如果它们不同，或者如果你能用你对特定语言的知识回答我的问题，请解释。我将使用 c++ 作为示例，但我想知道它在 java、c 或任何其他主流语言中的工
c++ - C++11 段错误中基于范围的 for 循环，但不是常规 for 循环
这个问题不太可能帮助任何 future 的访问者；它只与一个小的地理区域、一个特定的时间点或一个非常狭窄的情况有关，这些情况并不普遍适用于互联网的全局受众。为了帮助使这个问题更广泛地适用，visit
c - while 循环(和 for 循环)上的 scanf 错误，永远扫描
我是 C 编程和编写代码的新手，以确定 M 测试用例的质因数分解。如果我一次只扫描一次，该功能本身就可以工作，但是当我尝试执行 M 次时却惨遭失败。我不知道为什么 scanf() 循环有问题。 in
javascript - 进行修改时应出现 'for-of' 循环，而不是 'for' 循环
这个问题已经有答案了: JavaScript by reference vs. by value [duplicate] (4 个回答) 已关闭 3 年前。我在使用 TSlint 时遇到问题，并且理
javascript - 为 Charts.js 添加 for 循环/foreach 循环
我尝试在下面的代码中添加 foreach 或 for 循环，以便为 Charts.js 创建多个数据集。这将允许我在此折线图上创建多条线。我有一个 PHP 对象，我可以对其进行编码以稍后填充变量，但

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城