gpt4 book ai didi

r - 根据多列减去行

转载 作者:行者123 更新时间:2023-12-04 02:08:53 26 4
gpt4 key购买 nike

我有这个数据框

数据

df <- data.frame(id=c(rep("site1", 3), rep("site2", 8), rep("site3", 9), rep("site4", 15)),
major_rock = c("greywacke", "mudstone", "gravel", "greywacke", "gravel", "mudstone", "gravel", "mudstone", "mudstone",
"conglomerate", "gravel", "mudstone", "greywacke","conglomerate", "gravel", "gravel", "greywacke","gravel",
"greywacke", "gravel", "mudstone", "greywacke", "gravel", "gravel", "gravel", "conglomerate", "greywacke",
"coquina", "gravel", "gravel", "greywacke", "gravel", "mudstone","mudstone", "gravel"),
minor_rock = c("sandstone mudstone basalt chert limestone", "limestone", "sand silt clay", "sandstone mudstone basalt chert limestone",
"sand silt clay", "sandstone conglomerate coquina tephra", NA, "limestone", "mudstone sandstone coquina limestone",
"sandstone mudstone limestone", "sand loess silt", "sandstone conglomerate coquina tephra", "sandstone mudstone basalt chert limestone",
"sandstone mudstone limestone", "sand loess silt", "loess silt sand", "sandstone mudstone conglomerate chert limestone basalt",
"sand silt clay", "sandstone mudstone conglomerate", "loess sand silt", "sandstone conglomerate coquina tephra", "sandstone mudstone basalt chert limestone",
"sand loess silt", "sand silt clay", "loess silt sand", "sandstone mudstone limestone", "sandstone mudstone conglomerate chert limestone basalt",
"limestone", "loess sand silt", NA, "sandstone mudstone conglomerate", "sandstone siltstone mudstone limestone silt lignite", "limestone",
"mudstone sandstone coquina limestone", "mudstone tephra loess"),
area_ha = c(1066.68, 7.59, 3.41, 4434.76, 393.16, 361.69, 306.75, 124.93, 95.84, 9.3, 8.45, 4565.89, 2600.44, 2198.52,
2131.71, 2050.09, 1640.47, 657.09, 296.73, 178.12, 10403.53, 8389.2, 8304.08, 3853.36, 2476.36, 2451.25,
1640.47, 1023.02, 532.94, 385.68, 296.73, 132.45, 124.93, 109.12, 4.87))

其中有 4 个站点,其中 2 个是独立(site1site3 ;它们不包括任何上游站点)和 2 个依赖(site2site4;它们包括上游站点( s))

我想创建一个新的 data.frame,我们称它为 df_indep。其中,我希望所有站点都是独立,这意味着从依赖站点中减去任何上游站点,如下所示

site1 and site3 will remain the same as they are independent

site2 (independent) = site2 - site1

site4 (independent) = site4 -(site2+site3)

下面的 df 仅适用于 area_percent 大于 15% 的 major_rockminor_rock 组合(减去之前上游站点;site2site3)

library(dplyr)
head(df %>% group_by(id) %>%
mutate(area_percent = area_ha/sum(area_ha)*100) %>%
filter(area_percent>5),15)


# id major_rock minor_rock area_ha area_percent
# <fctr> <fctr> <fctr> <dbl> <dbl>
#1 site1 greywacke sandstone mudstone basalt chert limestone 1066.68 98.979289
#2 site2 greywacke sandstone mudstone basalt chert limestone 4434.76 77.329604
#3 site2 gravel sand silt clay 393.16 6.855592
#4 site2 mudstone sandstone conglomerate coquina tephra 361.69 6.306845
#5 site2 gravel NA 306.75 5.348848
#6 site3 mudstone sandstone conglomerate coquina tephra 4565.89 27.978879
#7 site3 greywacke sandstone mudstone basalt chert limestone 2600.44 15.934986
#8 site3 conglomerate sandstone mudstone limestone 2198.52 13.472099
#9 site3 gravel sand loess silt 2131.71 13.062701
#10 site3 gravel loess silt sand 2050.09 12.562550
#11 site3 greywacke sandstone mudstone conglomerate chert limestone basalt 1640.47 10.052479
#12 site4 mudstone sandstone conglomerate coquina tephra 10403.53 25.925869
#13 site4 greywacke sandstone mudstone basalt chert limestone 8389.20 20.906106
#14 site4 gravel sand loess silt 8304.08 20.693984
#15 site4 gravel sand silt clay 3853.36 9.602674

这是

最终结果

我要减去上游站点之后

#       id   major_rock                                             minor_rock area_ha area_percent
#1 site1 greywacke sandstone mudstone basalt chert limestone 1066.68 98.979289
#2 site2 greywacke sandstone mudstone basalt chert limestone 3368.08 72.319849
#3 site2 gravel sand silt clay 389.75 8.368762
#4 site2 mudstone sandstone conglomerate coquina tephra 361.69 7.766254
#5 site2 gravel NA 306.75 6.586576
#6 site3 mudstone sandstone conglomerate coquina tephra 4565.89 27.978879
#7 site3 greywacke sandstone mudstone basalt chert limestone 2600.44 15.934986
#8 site3 conglomerate sandstone mudstone limestone 2198.52 13.472099
#9 site3 gravel sand loess silt 2131.71 13.062701
#10 site3 gravel loess silt sand 2050.09 12.562550
#11 site3 greywacke sandstone mudstone conglomerate chert limestone basalt 1640.47 10.052479
#12 site4 mudstone sandstone conglomerate coquina tephra 5475.95 30.297305
#13 site4 greywacke sandstone mudstone basalt chert limestone 1354.00 7.491403
#14 site4 gravel sand loess silt 6163.92 34.103701
#15 site4 gravel sand silt clay 2803.11 15.509031

对于如何在 R 中执行此操作的任何建议,我将不胜感激。

更新

这是一张显示所有 4 个站点的 map

enter image description here

下图在减去 site2site3

enter image description here

下图显示site2(累积)和独立的相同 enter image description here

关于@rbierman 的网站依赖项如何编码的问题,请查看下方。

#      id dependent dep_site1 dep_site2 dep_site3
#1 site1 no no no no
#2 site1 no no no no
#3 site1 no no no no
#4 site2 yes yes no no
#5 site2 yes yes no no
#6 site2 yes yes no no
#7 site2 yes yes no no
#8 site2 yes yes no no
#9 site2 yes yes no no
#10 site2 yes yes no no
#11 site2 yes yes no no
#12 site3 no no no no
#13 site3 no no no no
#14 site3 no no no no
#15 site3 no no no no
#16 site3 no no no no
#17 site3 no no no no
#18 site3 no no no no
#19 site3 no no no no
#20 site3 no no no no
#21 site4 yes yes yes yes
#22 site4 yes yes yes yes
#23 site4 yes yes yes yes
#24 site4 yes yes yes yes
#25 site4 yes yes yes yes
#26 site4 yes yes yes yes
#27 site4 yes yes yes yes
#28 site4 yes yes yes yes
#29 site4 yes yes yes yes
#30 site4 yes yes yes yes
#31 site4 yes yes yes yes
#32 site4 yes yes yes yes
#33 site4 yes yes yes yes
#34 site4 yes yes yes yes
#35 site4 yes yes yes yes

最佳答案

这还不错,只是稍微重命名和加入。

首先,我们需要一个漂亮的两列格式的依赖项。您可以在发布的宽依赖项上使用 reshape2::melttidyr::gather 以使其变长:

deps = data.frame(
id = c("site2", "site4", "site4"),
dependency = c("site1", "site2", "site3"),
stringsAsFactors = FALSE
)
# id dependency
# 1 site2 site1
# 2 site4 site2
# 3 site4 site3

使用 dplyr 进行连接,我们还需要 character 而不是 factor 列,以防级别不完全相同。

    library(dplyr)    
df = mutate_at(df, .cols = c("id", "major_rock", "minor_rock"), .funs = funs(as.character))

首先,我们创建一个“依赖与度量”数据框,该数据框具有明确的依赖区域名称和 ID(编辑),然后我们将其聚合到 id水平,总结依赖领域:

dep_w_measure = df %>%
select(dependency = id, major_rock, minor_rock, dep_area = area_ha) %>%
inner_join(deps) %>%
group_by(id, major_rock, minor_rock) %>%
summarize(dep_area = sum(dep_area))

然后我们将其加入原始数据,并减去相关区域(如果存在):

result = left_join(df, dep_w_measure, by = c("major_rock", "minor_rock", "id")) %>%
mutate(area_ind = area_ha - coalesce(dep_area, 0))
head(result)
# id major_rock minor_rock area_ha dep_area area_ind
# 1 site1 greywacke sandstone mudstone basalt chert limestone 1066.68 NA 1066.68
# 2 site1 mudstone limestone 7.59 NA 7.59
# 3 site1 gravel sand silt clay 3.41 NA 3.41
# 4 site2 greywacke sandstone mudstone basalt chert limestone 4434.76 1066.68 3368.08
# 5 site2 gravel sand silt clay 393.16 3.41 389.75
# 6 site2 mudstone sandstone conglomerate coquina tephra 361.69 NA 361.69

我将 dep_areaarea_ha 列留在“展示我的作品”中,您可以根据需要清理它。独立区域 area_ind 列与所需输出中的 area_ha 匹配。

关于r - 根据多列减去行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42522361/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com