gpt4 book ai didi

r - 当一组参数被粘贴()作为向量时,如何获得glm模型的R中的真实残差偏差和自由度

转载 作者:行者123 更新时间:2023-12-04 11:22:04 25 4
gpt4 key购买 nike

我正在编写一个脚本(在 python 中,在 pypeR 中使用 R 部分),因此我需要在 R 中使用一个函数来比较两个模型与 F 比率测试。

模型是这样的:

型号 1: Response ~ Predictor A + Predictor B + Predictor C.... + Predictor n 型号 2: Response ~ Predictor 1
一起预测变量A+B+...n化妆Predictor 1 ,所以在这里嵌套没有问题(相信我)。

当我通过Predictor A + Predictor B + Predictor C.... + Predictor n对于我创建的函数,我认为它将它们视为一个变量(因为自由度与 Model 2 的自由度相同)。也许这是因为我正在使用 paste() ?无论如何,模型 1 中预测变量的实际数量将在运行中发生变化(这就是为什么我需要它作为函数的原因),所以我不确定除了使用 paste() 之外还有什么方法可以适应这一点。 .

请记住,粘贴实际上可能不是这里的问题;我只是想让人们知道我认为问题可能是。

对于如何获得 model 1 的真实残差和自由度,有什么建议吗? ?它可以是一个黑客。例如,我只是减去 length(vector of predictors) - 1来获得自由度。我不知道残留偏差的类似黑客会是什么。

这是函数和示例实例:

make_and_compare_models <- function(fitness_trait_name, data_frame_name, vector_for_multiple_regression, predictor_for_single_regression, fam){
fit1<-glm(formula=as.formula(paste(fitness_trait_name,"~", paste(vector_for_multiple_regression, sep="+"))), family=fam, data=data_frame_name)
#print ('length of vector of predictors')
additional.degrees.of.freedom.fit1<-length(vector_for_multiple_regression)-1 ##the paste above prevents R from recognizing all of the vectors as separate predictors. This -1 gives you the difference in parameter number between the two models.
print ("summary fit 1")
print(summary(fit1))
dev1<-(fit1$deviance)
print ('residual deviance of fit1')
print (dev1)
print(fit1$df.residual)

##this is how I'd correct for degrees of freedom
#df1=fit1$df.residual-additional.degrees.of.freedom.fit1
#fit1$df.residual=df1

##if the old way
df1=fit1$df.residual
print(fit1$df.residual)
print ('df1')
print (df1)

fit2<- glm(data=data_frame_name, formula=as.formula(paste(fitness_trait_name,"~",predictor_for_single_regression)), family=fam)

print("summary fit 2")
print(summary(fit2))
print ("deviance of fit2")
dev2<-(fit2$deviance)
print(dev2)
df2=fit2$df.residual
print ('df2')
print (df2)
F.ratio<-((dev2-dev1)/(df2-df1))/(dev1/df1)
print('F.ratio')
print(F.ratio)
new.p<-1-pf(F.ratio,abs(df1-df2),max(df2,df1))
print('new.p')
print(new.p)

}

data <- structure(list(ID = c(1L, 2L, 4L, 7L, 9L, 10L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 20L, 21L, 22L, 23L, 24L, 25L, 27L, 28L, 29L,
31L, 34L, 37L, 38L, 39L, 40L, 41L, 43L, 44L, 45L, 46L, 47L, 48L,
49L, 52L, 55L, 56L, 59L, 60L, 61L, 62L, 63L, 65L, 66L, 67L, 68L,
69L, 71L), QnWeight_initial = c(158L, 165L, 137L, 150L, 153L,
137L, 158L, 163L, 159L, 151L, 145L, 144L, 157L, 144L, 133L, 148L,
151L, 151L, 147L, 158L, 178L, 164L, 134L, 151L, 148L, 142L, 127L,
179L, 162L, 150L, 151L, 153L, 163L, 155L, 163L, 170L, 149L, 165L,
128L, 134L, 145L, 147L, 148L, 160L, 131L, 155L, 169L, 143L, 123L,
151L), Survived_eclosion = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L,
1L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Days_wrkr_eclosion_minus20 = c(NA,
1L, NA, 3L, 0L, 2L, 0L, 1L, 0L, 0L, 0L, 1L, NA, 0L, 7L, 1L, 0L,
1L, 0L, 1L, 2L, 2L, NA, 2L, 3L, 2L, 2L, NA, 0L, 1L, NA, NA, 0L,
0L, 0L, 0L, 3L, 3L, 3L, 1L, 0L, 2L, NA, 1L, 0L, 1L, 1L, 3L, 1L,
2L), MLH = c(0.5, 0.666666667, 0.555555556, 0.25, 1, 0.5, 0.333333333,
0.7, 0.5, 0.7, 0.5, 0.666666667, 0.375, 0.4, 0.5, 0.333333333,
0.4, 0.375, 0.3, 0.5, 0.3, 0.2, 0.4, 0.875, 0.6, 0.4, 0.222222222,
0.222222222, 0.6, 0.6, 0.3, 0.4, 0.714285714, 0.4, 0.3, 0.6,
0.4, 0.7, 0.625, 0.555555556, 0.25, 0.5, 0.5, 0.6, 0.25, 0.428571429,
0.3, 0.25, 0.375, 0.555555556), Acon5 = c(0.35387674, 0.35387674,
0.35387674, 0.35387674, 0.35387674, 0.35387674, 0.35387674, 0,
0, 1, 0, 1, 0.35387674, 0, 0, 0.35387674, 1, 1, 0, 0, 0, 1, 0,
0.35387674, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0,
0, 0, 1, 0, 0, 0, 1, 0, 0.35387674), Baez = c(1, 1, 1, 0.467836257,
1, 1, 0, 0, 1, 1, 0, 0.467836257, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0.467836257, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1,
1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1), C294 = c(0, 1, 0, 0, 1,
0.582542694, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0,
0, 1, 1, 0, 0, 0.582542694, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1), C316 = c(1, 1, 0, 0, 0.519685039,
0.519685039, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0.519685039, 0,
1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0.519685039, 1, 0, 1,
1, 0, 0.519685039, 1, 0.519685039, 1, 1, 1, 0.519685039, 0.519685039,
0, 0.519685039, 0.519685039, 0), i_120_PigTail = c(1, 1, 0, 1,
0.631236443, 0.631236443, 1, 1, 1, 1, 1, 0, 0.631236443, 1, 1,
1, 0, 0.631236443, 1, 1, 1, 0, 0, 1, 1, 1, 0.631236443, 0, 1,
1, 0, 1, 0.631236443, 1, 0, 1, 0, 0, 1, 0.631236443, 0.631236443,
0, 1, 0, 0.631236443, 0.631236443, 1, 0.631236443, 0.631236443,
1), i129 = c(0L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), Jackstraw_PigTail = c(0L, 1L, 1L, 0L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L,
0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Neil_Young = c(0.529636711,
0, 1, 0, 0.529636711, 0.529636711, 1, 1, 0, 1, 1, 1, 0, 0, 1,
1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0,
1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1), Ramble = c(0, 0, 0,
0, 0.215163934, 0.215163934, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0,
0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0.215163934, 0,
0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0.215163934, 0, 0, 0, 0), Sol_18 = c(1,
0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0.404669261,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1)), .Names = c("ID", "QnWeight_initial",
"Survived_eclosion", "Days_wrkr_eclosion_minus20", "MLH", "Acon5",
"Baez", "C294", "C316", "i_120_PigTail", "i129", "Jackstraw_PigTail",
"Neil_Young", "Ramble", "Sol_18"), class = "data.frame", row.names = c(NA,
-50L))


make_and_compare_models("QnWeight_initial", data, c("Acon5","Baez","C294","C316","i_120_PigTail","i129","Jackstraw_PigTail","Neil_Young","Ramble","Sol_18"), "MLH", "gaussian")

最佳答案

也许我误解了这个问题,但是 anova将比较模型,您可以对其进行测试。我不确定您关于嵌套的声明(并将由您决定是否在这里做一些明智的事情)

comparemodels <- function(data, response, terms1, terms2, test, family = 'gaussian', ...) {
f1 <- reformulate(terms1, response)
f2 <- reformulate(terms2, response)
m1 <- glm(f1, data = data, family = family)
m2 <- glm(f2, data = data, family = family)
compare <- anova(m1, m2, test = test)
print(compare)

}

response <- 'QnWeight_initial'
t1 <- c("Acon5","Baez","C294","C316","i_120_PigTail","i129","Jackstraw_PigTail","Neil_Young","Ramble","Sol_18")
t2 <- 'MLH'
comparemodels(data, response,t1, t2, test = 'F' )


Analysis of Deviance Table

Model 1: QnWeight_initial ~ Acon5 + Baez + C294 + C316 + i_120_PigTail +
i129 + Jackstraw_PigTail + Neil_Young + Ramble + Sol_18
Model 2: QnWeight_initial ~ MLH
Resid. Df Resid. Dev Df Deviance F Pr(>F)
1 39 7197.1
2 48 7614.1 -9 -417.08 0.2511 0.9837

关于r - 当一组参数被粘贴()作为向量时,如何获得glm模型的R中的真实残差偏差和自由度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14509495/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com