Is there a way to create a loop where I provide a function and dataframe and subsample it, and repeat the function with a subsample N times?(有没有办法创建一个循环，在其中我提供一个函数和数据帧并对其进行子采样，然后对一个子采样重复该函数N次？)-6ren

Is there a way to create a loop where I provide a function and dataframe and subsample it, and repeat the function with a subsample N times?(有没有办法创建一个循环，在其中我提供一个函数和数据帧并对其进行子采样，然后对一个子采样重复该函数N次？)

转载作者：bug小助手更新时间：2023-10-28 11:06:17

I am not sure what the correct word for this would be, so apologies for getting the terminology horribly wrong. Basically I have about 1000 datapoints, and I want to randomly subsample 100 data points 999 times and perform the same function (a generalised least squares model) on each subsample, and see how often the correlation would be significant.

我不确定用什么词来形容这件事才是正确的，所以很抱歉，我把术语弄错了。基本上，我有大约1000个数据点，我想随机对100个数据点进行999次子采样，并对每个子样本执行相同的函数(广义最小二乘模型)，看看相关性有多大。

I am also adding some more context, in case it helps. My data is in a data frame with various columns, and I am doing a comparing if there is a relationship between altitude and dichromatism, and whether the relationship between the two varies depending on whether dichromatism is measured using a spectrophotometer or human scoring. I also include latitude centroid of species range in these models, so the PGLS for each looks as follows:

我还添加了一些更多的背景信息，以防有帮助。我的数据在一个有不同列的数据框中，我正在比较海拔高度和双色性之间是否存在关系，以及两者之间的关系是否会因双色性是使用分光光度计测量还是使用人类评分而变化。我还在这些模型中包括了物种范围的纬度质心，因此每个模型的PGL如下所示：

PGLS_VO_Score <- gls(Colour_discriminability_Absolute ~ Altitude_Reported*Centroid.Abs, 
                          correlation = corPagel(1, phy = AvianTreeEdge, form = ~Species), 
                          data = VO_HumanScores_Merged, method = "ML")

PGLS_Human_Score <- gls(Human_Score ~ Altitude_Reported*Centroid.Abs, 
                        correlation = corPagel(1, phy = AvianTreeEdge, form = ~Species), 
                        data = VO_HumanScores_Merged, method = "ML")

And the data frame of VO_Human_Scores_Merged included a columnn for species names, for Human Scores, for spectrophotometer scores, altitude, latitude, and then some transformed values of those (log transformed, etc.) which I did to begin with in case I needed to to transform the data to meet the assumptions of the PGLS.

VO_Human_Score_Merge的数据框包括种名栏、人类评分栏、分光光度计分栏、海拔、纬度栏以及它们的一些变换值(对数变换等)。我一开始就这样做了，以防我需要转换数据以满足PGL的假设。

更多回答

优秀答案推荐

A pipeline sampling helps to view what can be done here:

管道采样有助于查看此处可以执行的操作：

myfun <- function(x) cor(x[[1]], x[[3]])
set.seed(42)
replicate(5, mtcars[sample(nrow(mtcars), 10),], simplify=FALSE) |>
  lapply(myfun)
# [[1]]
# [1] -0.8130999
# [[2]]
# [1] -0.8633841
# [[3]]
# [1] -0.7967049
# [[4]]
# [1] -0.901294
# [[5]]
# [1] -0.8761853

(My 5 is your 999, my 10 is your 100.)

(My 5是你的999，我的10是你的100。

The simplify=FALSE is required since otherwise replicate will reduce to a (nested) matrix, not what we want. My myfun is contrived, use whatever function you want.

simplify=是必需的，因为否则replicate将减少到（嵌套的）矩阵，而不是我们想要的。我的myfun是人为的，使用任何你想要的功能。

The (perhaps only) advantage to breaking it out into two (or more) steps in a pipeline is that if you want to go back to revisit the random sampling, it's much simpler if you save that random sampling. For example,

在管道中将其分成两个(或更多)步骤的好处(可能只有一个)是，如果您想要重新查看随机采样，则保存随机采样会简单得多。例如,

set.seed(42)
sampdat <- replicate(5, mtcars[sample(nrow(mtcars), 10),], simplify=FALSE)
lapply(sampdat, myfun)
# [[1]]
# [1] -0.8130999
# [[2]]
# [1] -0.8633841
# [[3]]
# [1] -0.7967049
# [[4]]
# [1] -0.901294
# [[5]]
# [1] -0.8761853

If you later realize you need to do something else with the sample data (another metric or whatever) and you don't (for time, memory, or convenience) want to have to rerun all of the other sample-aggregations, you can re-use sampdat.

如果您后来意识到需要对样本数据执行其他操作(另一个指标或其他指标)，并且您不想(为了时间、内存或便利性)必须重新运行所有其他样本聚合，则可以重用sampdat。

You can take a random sample from your datapoints using sample. Then you can run your function n times using replicate.
An example that takes a random sample of n=100 and computes the mean 10 times:

您可以使用Sample从您的数据点随机抽取样本。然后，您可以使用REPLICATE运行函数n次。下面是一个随机抽样n=100并计算平均值10次的示例：

> set.seed(1)
> datapoints <- runif(1000, max = 10000)
> result <- replicate(10, mean(sample(datapoints, 100)))
5194.298 5063.320 5064.992 4681.281 5008.011 4849.998 5320.206 5012.931 4900.636 4776.135

更多回答

Thank you for your comment. I think I did something wrong, and am not sure why, because every output I got was the exact same, which I do not believe is what is meant to happen. This is what I put in myfun <- function(PGLS_VO_Scores) cor(VO_HumanScores_Merged$Colour_discriminability_Absolute, VO_HumanScores_Merged$Altitude_Reported) BirdReplicationAttempt <- replicate(999, VO_HumanScores_Merged[sample(nrow(VO_HumanScores_Merged), 100),], simplify=FALSE) |> lapply(myfun)

感谢您发送编修。我想我做错了什么，不知道为什么，因为我得到的每一个输出都是完全一样的，我不相信这是注定要发生的。这是我在myfun <- function（PGLS_VO_Scores）cor（VO_HumanScores_Merged$Colour_discriminability_Absolute，VO_HumanScores_Merged$Altitude_Reported）中放入的内容BirdReplicationAttempt <- replicate（999，VO_HumanScores_Merged[sample（nrow（VO_HumanScores_Merged），100），]，simplify=）|> lapply（myfun）

I have added more context to the original query in case that helps in understanding where the error occurred

我向原始查询添加了更多上下文，以防有助于理解错误发生的位置

You write a function that accepts as its sole argument PGLS_VO_Scores but never use it, instead choosing to breach scope and grab data from something else entirely. The function is supposed to take sample data and do something with that sample data, not data that might (or might not) be in some calling environment.

您编写了一个函数，它接受PGLS_VO_Scores作为其唯一参数，但从不使用它，而是选择突破范围并完全从其他东西获取数据。该函数应该获取样本数据并对该样本数据执行某些操作，而不是可能（或可能不）在某些调用环境中的数据。

Try changing your function to myfun <- function(x) cor(x$Colour_discriminability_Absolute, x$Altitude_Reported) and rerun your replication.

尝试将您的函数更改为myFun<-Function(X)COR(x$COLUR_DIRECTABILY_ADVAL，x$ALIGHTAL_REPORTED)，然后重新运行复制。

Thanks, that seems to have worked. And just to confirm, the output of that, is that the p values of the correlation? Or the correlation itself?

谢谢，这似乎起作用了。为了确认一下，输出的是相关性的p值吗？或者相关性本身？

Thank you for your comment. I tried to do this using the PGLS function which I want to rerun, replacing that for the "mean". and replacing "datapoints" for my data set, so it reads as follows: replicate(999, PGLS_VO_Score(sample(VO_HumanScores_Merged, 100))), but I only got an error, as follows: Error in PGLS_VO_Score(sample(VO_HumanScores_Merged, 100) : could not find function "PGLS_VO_Score" Is there a way to resolve this so that it recognises the function which I used for the entire dataset as the function I want to apply to each subset?

感谢您发送编修。我尝试使用PGLS函数来实现这一点，我想将其替换为“平均值”。并将我的数据集替换为“datapoints”，因此它如下所示：replicate（999，PGLS_VO_Score（sample（VO_HumanScores_Merged，100），但我只得到一个错误，如下所示：（示例（VO_HumanScores_Merged，100）：无法找到函数“PGLS_VO_Score”有没有一种方法可以解决这个问题，使它识别我用于整个数据集的函数，作为我想要应用于每个子集的函数？

文章推荐： jquery - 如何在 HTML 元素中设置数据属性

文章推荐： node.js - Mongoose - 按标准查找子文档

文章推荐： macos - 在 Mac OS X 上安装/升级 gradle

function - 命名空间::function cannot be used as a function
main.cpp #include "Primes.h" #include int main(){ std::string choose; int num1, num2; w
c - 为什么调用此函数会产生错误 " is not a function or function pointer"？
似乎函数 qwertyInches() 应该可以工作但是当我在 main() 中调用它时它给了我 [Error] called object 'qwertyInches' is not a funct
c++ - object.function().function().function().......这是如何工作的？
我无法理解 C++ 语法的工作原理。 #include using namespace std; class Accumulator{ private: int value; public:
function - dart 中的 Function() 和 Function 有什么区别？
在类中声明函数成员时，我们可以同时执行这两种操作； Function first; Function() second; 它们之间有什么区别？最佳答案 Function 代表任意函数: void
jquery错误: a function is not a function?
“colonna”怎么可能是一个简单的字符串: $('td.' + colonna).css('background-color','#ffddaa'); 可以正确突出显示有趣单元格的背景，并且: $
javascript - 如何将传递到 function() 的动态参数中继到 function() 中调用的 function()
我正在尝试将网页中的动态参数中继到函数中，然后函数将它们传递给函数内部的调用。比如下面这个简化的代码片段，现在这样，直接传入参数是没有问题的。但是，如何在不为每个可能的 colorbox 参数设置 s
C++ Lambdas : function that returns a function that returns a function . ..？
C++ 中是否有一种模式允许您返回一个函数，它返回一个函数本身。例如 std::function func = ...; do { func = func(); } while (func);
c - 错误 : function declared as function returning function
我正在将 Windows 程序集移植到 Linux。我有一些代码要移植。我实际上是 linux 中 C 的新手。我知道 C 基础知识是一样的! typedef struct sReader {
javascript - 何时使用 function() 、 function 或 () => function(callback)
我一直在寻找一个很好的解释，所以我很清楚。示例: this.onDeleteHandler(index)}/> 对比对比 this.nameChangedhandler(event, perso
javascript - 为什么 function(){}.__proto__ === Function.prototype 和 Function.prototype === function(){}.__proto_ 返回不同的结果
function(){}.__proto__ === Function.prototype 和 Function.prototype === function(){}.__proto__ 得到不同的结
javascript - 'Function' 上的 MDN 描述感到困惑，Function.length 是 Function 或 Function.prototype 的属性
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Function 据说 Propert
function - Excel VBA : Special Types - Functions as Arguments of Functions
VBA 中的函数没有特殊类型。我很难理解如何在 Excel VBA 中将函数作为参数添加到函数中。我想要完成的是这样的事情: function f(g as function, x as strin
r - Tidyeval in own functions in own functions inside own functions with the pipe 管道
所以我正在尝试制作一个包(我没有在下面包含我的 roxygen2 header ): 我有这个功能: date_from_text % dplyr::mutate(!!name := lubr
c++ - 从 std::function 继承构造函数时为 "function returning a function"
尝试从 std::function 派生一个类，对于初学者来说，继承构造函数。这是我的猜测: #include #include using namespace std; template cla
javascript - 错误: function is not defined when calling a function returned by another function
我正在尝试编写一个返回另一个函数的函数。我的目标是编写一个函数，它接受一个对象并返回另一个函数“search”。当我使用键调用搜索函数时，我想从第一个函数中给定的对象返回该键的值。 propertyO
functional-programming - "Functional programming"有明确的含义，但是 "functional language"吗？
我非常清楚函数式编程技术和命令式编程技术之间的区别。但是现在有一种普遍的趋势是谈论“函数式语言”，这确实让我感到困惑。当然，像 Haskell 这样的一些语言比 C 等其他语言更欢迎函数式编程。但即
JavaScript美学: "function foo() {}" vs "var foo = function() {};" in AMD functions
关闭。这个问题是opinion-based 。目前不接受答案。想要改进这个问题吗？更新问题，以便 editing this post 可以用事实和引文来回答它。 . 已关闭 8 年前。 Improv
javascript - Function.call、Function.prototype.call、Function.prototype.call.call 和 Function.prototype.call.call.call 之间的区别
我在stackoverflow上查过很多类似的问题，比如call.call 1 , call.call 2 ，但我是新人，无法发表任何评论。我希望我能找到关于 JavaScript 解释器如何执行这些
google-cloud-functions - 从 Cloud Function 本身获取 Cloud Function 名称
向 Twilio 发送 SMS 时，Twilio 会向指定的 URL 发送多个请求，以通过 Webhook 提供该 SMS 传送的状态。我想让这个回调异步，所以我开发了一个 Cloud Functio
azure-functions - 如何获取使用 Terraform 部署的 Function-App 中的 "Function Url"？
作为 IaC 的一部分，A 功能应用，让我们将其命名为 FuncAppX 是使用 Terraform 部署的，它有一个内置函数。我需要使用 Terraform 在函数应用程序中访问相同函数的 Ur

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Is there a way to create a loop where I provide a function and dataframe and subsample it, and repeat the function with a subsample N times?(有没有办法创建一个循环，在其中我提供一个函数和数据帧并对其进行子采样，然后对一个子采样重复该函数N次？)