gpt4 book ai didi

r - 如何在没有by子句的data.table中使用lapply

转载 作者:行者123 更新时间:2023-12-01 00:11:43 26 4
gpt4 key购买 nike

我正在尝试使用 data.table、lapply 和函数调用对同一变量运行多个回归。我想得到一个简单的表格作为输出,显示每个变量和每个变量的决定系数。

我正在使用 Rstudio 1.2.1335,data.table 1.12.2
我使用的数据集是“http://users.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Appendix%20C%20Data%20Sets/APPENC02.txt

cnames<-c("ID","County","State","Area","Pop","Young","Old","Phys","Beds","Crime","HighSchool","BA","Poverty","Unemploy","PerCapitaIncome","TotalIncome","Region")
df62<-fread("APPENC02.txt", col.names=cnames)
df62[,c("ID", "County","State","Region"):=NULL]
variability<-function(y){
model<-eval(substitute(lm(Phys~y, data=df62)))
anova<-anova(model)
SSR<- anova$`Sum Sq`[1]
SSE<- anova$`Sum Sq`[2]
SSTO<-SSR+SSE
R2<-SSR/SSTO
return(R2)
}
df62[ , lapply(.SD, variability)]

如果最后一行是:
df62[ , lapply(.SD, Variability), by=Phys]

Error Message when I omit the 'by' clause: "Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, : object 'i' not found"



如果我按变量“Phys”分组,我会得到正确的结果,但我会不必要地重复每个结果。

最佳答案

我们可以用 reformulate 创建表达式.在这里,我们可以传递两个参数,'data' 和 'y',y 将把列名作为参数。

variability<-function(data, y){
model<- lm(reformulate(y, "Phys"), data=data)
anova<-anova(model)
SSR<- anova$`Sum Sq`[1]
SSE<- anova$`Sum Sq`[2]
SSTO<-SSR+SSE
R2<-SSR/SSTO
return(R2)
}

选择感兴趣的列名
nm1 <- setdiff(names(df62), "Phys")

循环遍历它们,应用函数,而 data.SD
setnames(df62[, lapply(nm1, variability, data = .SD)], nm1)[]
# Area Pop Young Old Beds Crime HighSchool BA Poverty Unemploy PerCapitaIncome TotalIncome
#1: 0.006095652 0.8840674 0.01432791 9.788323e-06 0.9033826 0.6731538 1.804622e-05 0.05605789 0.004113459 0.002551878 0.0999411 0.8989137

数据
cnames<-c("ID","County","State","Area","Pop","Young","Old","Phys","Beds","Crime","HighSchool","BA","Poverty","Unemploy","PerCapitaIncome","TotalIncome","Region")

df62 <- fread("http://users.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Appendix%20C%20Data%20Sets/APPENC02.txt", col.names = cnames)
df62[,c("ID", "County","State","Region"):=NULL]

关于r - 如何在没有by子句的data.table中使用lapply,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58035981/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com