gpt4 book ai didi

r - 如何根据类别排列数据框,然后变异新的协变量,列出适合特定类别的所有名称

转载 作者:行者123 更新时间:2023-12-04 07:20:51 25 4
gpt4 key购买 nike

这个问题在这里已经有了答案:





Collapse / concatenate / aggregate a column to a single comma separated string within each group

(4 个回答)



Concatenate unique strings after groupby in R

(1 个回答)


上个月关闭。




我在看基因本体,有这个数据框:

> head(BT_Ctrl_go_terms, 13)
# A tibble: 13 x 4
go_term n gene go_name
<chr> <int> <chr> <chr>
1 GO:0001525 15 NRP1 angiogenesis
2 GO:0001525 15 ANG angiogenesis
3 GO:0001525 15 THY1 angiogenesis
4 GO:0001525 15 ATP5F1B angiogenesis
5 GO:0001525 15 ECM1 angiogenesis
6 GO:0001666 6 ANG response to hypoxia
7 GO:0001666 6 CAT response to hypoxia
8 GO:0001666 6 HSP90B1 response to hypoxia
9 GO:0002250 8 IGKV1-27 adaptive immune response
10 GO:0002250 8 IGHV3-21 adaptive immune response
11 GO:0002250 8 TNFRSF21 adaptive immune response
12 GO:0002250 8 IGLV2-11 adaptive immune response
13 GO:0002250 8 IGHV4-34 adaptive immune response
我需要排列数据,以便每个 go_name被列在一行一次。然后,我需要一个新的协变量 genes列出所有 BT_Ctrl_go_term$gene属于对应的 BT_Ctrl_go_term$go_name .每个 gene name必须以 , 分隔.
预期输出 :
     go_term  n                  go_name                                            genes
1 GO:0001525 15 angiogenesis NRP1, ANG, THY1, ATP5F1B, ECM1
2 GO:0001666 6 response to hypoxia ANG, CAT, HSP90B1
3 GO:0002250 8 adaptive immune response IGKV1-27, IGHV3-21, TNFRSF21, IGLV2-11, IGHV4-34
一个 dplyr解决方案是优选的。
数据
BT_Ctrl_go_term <- structure(list(go_term = c("GO:0001525", "GO:0001525", "GO:0001525", 
"GO:0001525", "GO:0001525", "GO:0001666", "GO:0001666", "GO:0001666",
"GO:0002250", "GO:0002250", "GO:0002250", "GO:0002250", "GO:0002250"
), n = c(15L, 15L, 15L, 15L, 15L, 6L, 6L, 6L, 8L, 8L, 8L, 8L,
8L), gene = c("NRP1", "ANG", "THY1", "ATP5F1B", "ECM1", "ANG",
"CAT", "HSP90B1", "IGKV1-27", "IGHV3-21", "TNFRSF21", "IGLV2-11",
"IGHV4-34"), go_name = c("angiogenesis", "angiogenesis", "angiogenesis",
"angiogenesis", "angiogenesis", "response to hypoxia", "response to hypoxia",
"response to hypoxia", "adaptive immune response", "adaptive immune response",
"adaptive immune response", "adaptive immune response", "adaptive immune response"
)), row.names = c(NA, -13L), class = c("tbl_df", "tbl", "data.frame"
))

最佳答案

我们可以按组粘贴

library(dplyr)
BT_Ctrl_go_term %>%
group_by(go_term, n, go_name) %>%
summarise(gene = toString(unique(gene)), .groups = 'drop')
-输出
# A tibble: 3 x 4
go_term n go_name gene
<chr> <int> <chr> <chr>
1 GO:0001525 15 angiogenesis NRP1, ANG, THY1, ATP5F1B, ECM1
2 GO:0001666 6 response to hypoxia ANG, CAT, HSP90B1
3 GO:0002250 8 adaptive immune response IGKV1-27, IGHV3-21, TNFRSF21, IGLV2-11, IGHV4-34

关于r - 如何根据类别排列数据框,然后变异新的协变量,列出适合特定类别的所有名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68513068/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com