gpt4 book ai didi

r - 如果存在于同一组中,则折叠行

转载 作者:行者123 更新时间:2023-12-03 16:58:02 26 4
gpt4 key购买 nike

大家好,我有一个数据框,例如:

  Group       family
1 A Canidae
2 B Canidae
3 A Felidae
4 B Canidae
5 C Elephantidae
6 C Galinacae
7 D Galinacae
8 D Siuridae
9 E Apidae
我想崩溃 Group团体所在 family存在(例如:
犬科 存在于 一个 , 和
所以我折叠并添加 family2每个组的所有唯一值
Group family2
A,B Canidae,Felidae
然后我继续,我看到 象科和加林科都在 C和他 加利纳科 也在 D所以我崩溃了:
Group family2
A,B Canidae,Felidae
C,D Elephantidae,Galinacae,Siuridae
最后我们应该得到:
Group family2
A,B Canidae,Felidae
C,D Elephantidae,Galinacae,Siuridae
E Apidae
有人有想法吗?
这里的数据是为了诸如此类的事情吗?非常感谢您的帮助和时间。
如果有帮助,这里是数据:
structure(list(Group = structure(c(1L, 2L, 1L, 2L, 3L, 3L, 4L, 
4L, 5L), .Label = c("A", "B", "C", "D", "E"), class = "factor"),
family = structure(c(2L, 2L, 4L, 2L, 3L, 5L, 5L, 6L, 1L), .Label = c("Apidae",
"Canidae", "Elephantidae", "Felidae", "Galinacae", "Siuridae"
), class = "factor")), class = "data.frame", row.names = c(NA,
-9L))

最佳答案

这是我的一些查找功能的解决方案

# A lookup function that look for intersect between group 
# if there are at least one intersect - those group will be combined
look_up_group <- function(one_group, lookup_list) {
matched_list <- map(lookup_list, function(x) {
tryCatch(
{
intersect(x, one_group)
}, error = function(e) {
stop(paste0("Error in lookup function: one_group=", one_group, "; x=", x))
})
})

index <- which(unlist(map(matched_list, function(x) { length(x) > 0 })))
sort(unique(unlist(lookup_list[index])))
}


df %>%
# First remove all duplicated rows - exactly the same for both Group, Family
filter(!duplicated(.)) %>%
# arrange in alphabetical order
arrange(Group, family) %>%
# create a Group_2 which is combination of all Group for each family
group_by(family) %>%
mutate(Group_2 = list(Group)) %>%
ungroup() %>%
# Create Group_3 which is the full combined Group for all intersect Group
mutate(Group_3 = map(.[["Group_2"]], function(x) { look_up_group(one_group = x, lookup_list = .[["Group_2"]]) })) %>%
# Combine all Group_3 into a Group_final
mutate(Group_final = unlist(map(Group_3, function(x) { paste (x, collapse = ",")} ))) %>%
# Finally put them all together.
select(Group_final, family) %>%
group_by(Group_final) %>%
summarize(family = paste(family, collapse = ","), .groups = "drop")
这是最终的输出
  Group_final family                                   
* <chr> <chr>
1 A,B Canidae,Felidae,Canidae
2 C,D Elephantidae,Galinacae,Galinacae,Siuridae
3 E Apidae
为了更容易理解这里是每个步骤的细节
第一步
# remove duplicate & create variable Group_2
tmp <- df %>%
filter(!duplicated(.)) %>%
arrange(Group, family) %>%
group_by(family) %>%
mutate(Group_2 = list(Group)) %>%
ungroup()
我们有这个数据——
  Group family       Group_2  
<fct> <fct> <list>
1 A Canidae <fct [2]>
2 A Felidae <fct [1]>
3 B Canidae <fct [2]>
4 C Elephantidae <fct [1]>
5 C Galinacae <fct [2]>
6 D Galinacae <fct [2]>
7 D Siuridae <fct [1]>
8 E Apidae <fct [1]>
Group_2 看起来像这样
> tmp$Group_2
[[1]]
[1] A B
Levels: A B C D E

[[2]]
[1] A
Levels: A B C D E

[[3]]
[1] A B
Levels: A B C D E

[[4]]
[1] C
Levels: A B C D E

[[5]]
[1] C D
Levels: A B C D E

[[6]]
[1] C D
Levels: A B C D E

[[7]]
[1] D
Levels: A B C D E

[[8]]
[1] E
Levels: A B C D E
然后下一步创建 Group_3 并将它们组合成 Group_final
# Create Group_3
tmp <- tmp %>%
mutate(Group_3 = map(.[["Group_2"]],
function(x) { look_up_group(one_group = x, lookup_list = .[["Group_2"]]) })) %>%
mutate(Group_final = unlist(map(Group_3, function(x) { paste (x, collapse = ",")} )))
这是新的 tmp
# A tibble: 8 x 5
Group family Group_2 Group_3 Group_final
<fct> <fct> <list> <list> <chr>
1 A Canidae <fct [2]> <fct [2]> A,B
2 A Felidae <fct [1]> <fct [2]> A,B
3 B Canidae <fct [2]> <fct [2]> A,B
4 C Elephantidae <fct [1]> <fct [2]> C,D
5 C Galinacae <fct [2]> <fct [2]> C,D
6 D Galinacae <fct [2]> <fct [2]> C,D
7 D Siuridae <fct [1]> <fct [2]> C,D
8 E Apidae <fct [1]> <fct [1]> E
然后最后一步结合将Group_final中的家庭放在一起
tmp %>%
select(Group_final, family) %>%
group_by(Group_final) %>%
summarize(family = paste(family, collapse = ","), .groups = "drop")
最终结果
# A tibble: 3 x 2
Group_final family
* <chr> <chr>
1 A,B Canidae,Felidae,Canidae
2 C,D Elephantidae,Galinacae,Galinacae,Siuridae
3 E Apidae
[更新:添加了用于调试的 tryCatch]

关于r - 如果存在于同一组中,则折叠行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66232681/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com