gpt4 book ai didi

r - 比较两个数据集并找出通用名称

转载 作者:行者123 更新时间:2023-12-05 01:00:15 24 4
gpt4 key购买 nike

如果两个数据集的 CNA 和 chr 相同,我如何比较两个数据集并找到共同的基因名称

dt1

    CNA     chr   Genes
gain 5 Sall3,Kcng2,Atp9b,Nfatc1,Ctdp1
loss 5 RNU6-866P,TRIM5,TRIM34,TRIM22,TRIM5
gain 2 PDIA5,SEMA5B

dt2
    CNA     chr   Genes
gain 5 Sall3,Nfatc1,SNORA5,SNORA5
gain 5 RNU6-866P,OR8J1,OR8K3,OR8K3
gain 2 PDIA5,DCC

预期产出

df3
    CNA     chr   Genes
gain 5 Sall3,Nfatc1
gain 2 PDIA5

我确定这是一个微不足道的问题,但很想有一些建议来帮助我。

最佳答案

这是一种方法:

library(data.table)

df2 = setDT(df2)[,list(Genes=paste0(Genes, collapse=',')),by=list(CNA, chr)]
res = setkey(setDT(df1), CNA, chr)[df2]

# CNA chr Genes Genes.1
#1: gain 5 Sall3,Kcng2,Atp9b,Nfatc1,Ctdp1 Sall3,Nfatc1,SNORA5,SNORA5,RNU6-866P,OR8J1,OR8K3,OR8K3
#2: gain 2 PDIA5,SEMA5B PDIA5,DCC

res[, paste0(intersect(strsplit(Genes,',')[[1]], strsplit(Genes.1,',')[[1]]), collapse=',')
, by=list(CNA, chr)]

# CNA chr V1
#1: gain 5 Sall3,Nfatc1
#2: gain 2 PDIA5

数据:
df1 = structure(list(CNA = c("gain", "gain", "loss"), chr = c(2L, 5L, 
5L), Genes = c("PDIA5,SEMA5B", "Sall3,Kcng2,Atp9b,Nfatc1,Ctdp1",
"RNU6-866P,TRIM5,TRIM34,TRIM22,TRIM5")), .Names = c("CNA", "chr",
"Genes"), class = "data.frame", row.names = c(NA, -3L))

df2 = structure(list(CNA = c("gain", "gain", "gain"), chr = c(5L, 5L,
2L), Genes = c("Sall3,Nfatc1,SNORA5,SNORA5", "RNU6-866P,OR8J1,OR8K3,OR8K3",
"PDIA5,DCC")), .Names = c("CNA", "chr", "Genes"), class = "data.frame", row.names = c(NA,
-3L))

关于r - 比较两个数据集并找出通用名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29609209/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com