gpt4 book ai didi

r - R中的循环回归分析

转载 作者:行者123 更新时间:2023-12-04 08:35:23 25 4
gpt4 key购买 nike

我有一个鱼类丰度数据的数据集,我想对其进行回归分析。
但是,我想对数据的不同子集执行大量回归,而无需手动执行此操作,并将系数和 P 值保存在新数据框中。
数据结构如下(示例):

site year species abund
a 2011 a 3
a 2016 b 5
b 2011 a 4
b 2015 a 9
a 2018 b 1
c 2010 a 2
b 2016 c 3
c 2012 a 1
我总共有 883 行、21 个独特的地点、41 个独特的物种和 8 个不同的年份。
我想要每个物种-站点组合的回归模型。 (每个组合至少有 5 个观察值)模型如下所示:
lm(abund ~ year)  
但是每个地点的每个物种都有一个模型。因此,a 站点中的物种 a 的一个模型,a 站点中物种 b 的一个模型,b 站点中物种 a 的一个模型,等等。
堆栈上有几个关于此的主题,但似乎不适合我的需求。
我的想法是使用 for 循环,但我无法让它正常工作。一整天都在 tweeking,但无法让它发挥作用。
slopes <- numeric(nrow(df))

for (i in 1:nrow(df)) {
y <- as.numeric(df[i,4]) # row 4 is the abundancy data
x <- df([i, 1]) # row 1 is the year data
slopes[i] <- coef(lm(y ~ x))[2]
}
我的问题:我如何对所有独特的地点-物种组合进行线性回归模型并将系数和 P 值存储在新的数据框中?最好使用我失败尝试的改进版本。
提前致谢
50 个随机行样本的 dput:
df <- structure(list(year = c(2015, 2012, 2017, 2014, 2018, 2018, 2012, 
2018, 2013, 2013, 2012, 2018, 2012, 2018, 2016, 2013, 2018, 2019,
2012, 2019, 2017, 2014, 2013, 2014, 2018, 2016, 2013, 2019, 2019,
2018, 2019, 2014, 2012, 2018, 2017, 2016, 2017, 2015, 2017, 2019,
2012, 2016, 2019, 2019, 2018, 2014, 2012, 2015, 2012, 2012),
species = c("Aal", "Brasem", "Kolblei", "Dunlipharder", "Snoekbaars",
"Snoekbaars", "Paling", "Baars", "Tong", "Sprot", "Paling",
"Kolblei", "Baars", "Sprot", "Tong", "Baars", "Baars", "Zwartbekgrondel",
"Dikkopje", "Snoekbaars", "Blankvoorn", "Kolblei", "Kolblei",
"Baars", "Aal", "Kolblei", "Bot", "Snoekbaars", "Baars",
"Blankvoorn", "Zeebaars", "Snoekbaars", "Zeebaars", "Bot",
"Snoekbaars", "Bot", "Baars", "Baars", "Aal", "Snoekbaars",
"Baars", "Baars", "Bot", "Bot", "Bot", "Kleine koornaarvis",
"Snoekbaars", "Bot", "Blankvoorn", "Kleine koornaarvis"),
site = c("Amerikahaven (kop Aziëhaven)", "Het IJ (thv EyE)",
"Westhaven en ADM-haven (kop)", "Jan van Riebeeckhaven (thv Nuon)",
"Coenhaven", "Mercuriushaven", "Amerikahaven (kop Aziëhaven)",
"Mercuriushaven", "Amerikahaven kop Australiëhaven", "Het IJ (thv het Keerkringpark)",
"Amerikahaven kop Australiëhaven", "Coenhaven", "Jan van Riebeeckhaven (thv Nuon)",
"Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)", "Het IJ (thv het Keerkringpark)",
"Jan van Riebeeckhaven (kop NZK)", "Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)",
"Het IJ (thv EyE)", "Het IJ (thv EyE)", "Het IJ (thv het Keerkringpark)",
"Westhaven en ADM-haven (kop)", "Het IJ (kop Noordhollandsch kanaal)",
"Amerikahaven kop Australiëhaven", "Jan van Riebeeckhaven (thv Nuon)",
"Amerikahaven (kop Aziëhaven)", "Jan van Riebeeckhaven (kop NZK)",
"Westhaven en ADM-haven (kop)", "Jan van Riebeeckhaven (thv Nuon)",
"Amerikahaven kop Australiëhaven", "Het IJ (thv EyE)", "Amerikahaven (kop Aziëhaven)",
"Mercuriushaven", "Westhaven en ADM-haven (kop)", "Amerikahaven kop Australiëhaven",
"Minervahaven", "Westhaven en ADM-haven (kop)", "Westhaven en ADM-haven (kop)",
"Amerikahaven kop Australiëhaven", "Amerikahaven (kop Aziëhaven)",
"Amerikahaven kop Australiëhaven", "Jan van Riebeeckhaven (thv Nuon)",
"Westhaven en ADM-haven (kop)", "Petroleumhaven", "Westhaven en ADM-haven (kop)",
"Het IJ (thv EyE)", "Het IJ (kop Noordhollandsch kanaal)",
"Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)"), abund = c(5,
25, 2, 15, 3, 4, 1, 176, 4, 1, 1, 4, 55, 1, 1, 37, 75, 11,
1, 121, 4, 2, 2, 412, 38, 1, 5, 2, 443, 2, 6, 12, 1, 10,
33, 14, 120, 377, 67, 29, 43, 524, 4, 31, 18, 5, 18, 1, 9,
31), n = c(6L, 4L, 4L, 7L, 3L, 3L, 3L, 3L, 7L, 4L, 5L, 3L,
8L, 4L, 8L, 7L, 8L, 6L, 4L, 7L, 7L, 4L, 4L, 7L, 8L, 6L, 5L,
8L, 8L, 5L, 6L, 7L, 3L, 3L, 8L, 8L, 3L, 8L, 8L, 8L, 6L, 8L,
7L, 8L, 3L, 4L, 7L, 3L, 7L, 4L)), row.names = c(NA, -50L), class = c("tbl_df",
"tbl", "data.frame"), na.action = structure(c(`47` = 47L, `52` = 52L,
`60` = 60L, `88` = 88L, `128` = 128L, `401` = 401L, `488` = 488L,
`593` = 593L, `633` = 633L), class = "omit"))

最佳答案

all_sites <- unique(df$site)
all_species <- unique(df$species)

slopes <- expand.grid(all_sites, all_species)
names(slopes) <- c('site','species')
slopes$coef <- NA_real_

for (site in all_sites) {
for (species in all_species){
this_subset <- df[(df$site==site & df$species==species),]
if (nrow(this_subset)<2) next;
y <- this_subset$abund
x <- this_subset$year
cat('\n',site,species,nrow(this_subset),sum(is.na(x)),sum(is.na(y)))
slopes[slopes$site==site & slopes$species==species, ]$coef <- coef(lm(y ~ x))[2]
}
}

关于r - R中的循环回归分析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64823367/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com