gpt4 book ai didi

R:在函数参数中为一般(通用)使用的函数指定变量名

转载 作者:行者123 更新时间:2023-12-04 22:50:12 25 4
gpt4 key购买 nike

这是我的小功能和数据。请注意,我想设计一个非个人使用的通用功能。

dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)

myfun <- function (dataframe, varA, varB) {
daf2 <- data.frame (A = dataframe$A*dataframe$B,
B= dataframe$C*dataframe$D)
anv1 <- lm(varA ~ varB, daf2)
print(anova(anv1))
}

myfun (dataframe = dataf, varA = A, varB = B)

Error in eval(expr, envir, enclos) : object 'A' not found

当我指定 data$variable name 时它可以工作,但我不想做出这样的规范,以便它要求用户在函数中同时写入数据和变量名。
 myfun (dataframe = dataf, varA = dataf$A, varB = dataf$B)
Analysis of Variance Table

Response: varA
Df Sum Sq Mean Sq F value Pr(>F)
varB 1 82.5 82.5 1.3568e+33 < 2.2e-16 ***
Residuals 8 0.0 0.0
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning message:
In anova.lm(anv1) :
ANOVA F-tests on an essentially perfect fit are unreliable

在这种情况下,最佳做法是什么?我可以将数据框附加到函数中吗?这样做有什么缺点或潜在的冲突/危险?请参阅输出中的屏蔽语句。我相信一旦它被附加将保持附加的 session 提醒对吗?此处提供的函数只是示例,我需要更多下游分析,其中来自不同数据帧的变量名称可以/应该相同。我期待一个程序员解决方案。
myfun <- function (dataframe, varA, varB) {
attach(dataframe)
daf2 <- data.frame (A = A*B, B= C*D)
anv1 <- lm(varA ~ varB, daf2)
return(anova(anv1))
}

myfun (dataframe = dataf, varA = A, varB = B)

The following object(s) are masked from 'dataframe (position 3)':

A, B, C, D
Analysis of Variance Table

Response: varA
Df Sum Sq Mean Sq F value Pr(>F)
varB 1 82.5 82.5 1.3568e+33 < 2.2e-16 ***
Residuals 8 0.0 0.0
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning message:
In anova.lm(anv1) :
ANOVA F-tests on an essentially perfect fit are unreliable

最佳答案

让我们调查(见我添加的评论)你的原始函数和调用,假设你的意思是将你感兴趣的列的名称传递给函数:

myfun <- function (dataframe, varA, varB) {
#on this next line, you use A and B. But this should be what is
#passed in as varA and varB, no?
daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
#so, as a correction, we need:
colnames(daf2)<-c(varA, varB)
#the first argument to lm is a formula. If you use it like this,
#it refers to columns with _names_ varA and varB, not as names
#the _contents_ of varA and varB!!
anv1 <- lm(varA ~ varB, daf2)
#so, what we really want, is to build a formula with the contents
#of varA and varB: we have to this by building up a character string:
frm<-paste(varA, varB, sep="~")
anv1 <- lm(formula(frm), daf2)
print(anova(anv1))
}
#here, you pass A and B, because you are used to being able to do that in a formula
#(like in lm). But in a formula, there is a great deal of work done to make that
#happen, that doesn't work for most of the rest of R, so you need to pass the names
#again as character strings:
myfun (dataframe = dataf, varA = A, varB = B)
#becomes:
myfun (dataframe = dataf, varA = "A", varB = "B")

注意:在上面,我保留了原始代码,因此您可能需要删除其中的一些代码以避免您最初遇到的错误。您的问题的本质是您应该始终将列名作为字符传递,并按原样使用它们。这是 R 中公式的语法糖使人们养成坏习惯和误解的地方之一......

现在,至于替代方案:实际使用变量名称的唯一地方是在公式中。因此,如果您不介意稍后可以清理的结果中的一些细微的外观差异,则可以进一步简化问题:您无需传递列名称!
myfun <- function (dataframe) {
daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
#now we know that columns A and B simply exist in data.frame daf2!!
anv1 <- lm(A ~ B, daf2)
print(anova(anv1))
}

作为最后一条建议:我不会在您的最后一条语句中调用 print :如果您不这样做,但直接从 R 命令行使用此方法,无论如何它都会为您执行打印。作为一个额外的优势,您可以对从您的方法返回的对象执行进一步的工作。

带试用版的清理功能:
dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)
myfun <- function (dataframe, varA, varB) {
frm<-paste(varA, varB, sep="~")
anv1 <- lm(formula(frm), dataframe)
anova(anv1)
}
myfun (dataframe = dataf, varA = "A", varB = "B")
myfun (dataframe = dataf, varA = "A", varB = "D")
myfun (dataframe = dataf, varA = "B", varB = "C")

关于R:在函数参数中为一般(通用)使用的函数指定变量名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8121542/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com