gpt4 book ai didi

r - 如何匹配 2 个数据框列并提取列值和列名?

转载 作者:行者123 更新时间:2023-12-04 12:08:50 26 4
gpt4 key购买 nike

我有一个名为 mymat 的矩阵。我有一个名为 geno <- c("01","N1","11","1N","10") 的向量。我有另一个名为 key.table 的表。我想要做的是我想将 key 中的 key.table 列与 key 中的 mymat 列进行匹配,并且如果任何匹配行中的列值具有 geno 列中的任何一个,我想从 6104 中提取该列名称 5104使用匹配的 mymat 元素并将其粘贴到 geno 中的新列中 matched.extract 中每个 key.table 的相应行中并获得结果。

  mymat <- structure(c("chr5:12111", "chr5:12111", "chr5:12113", "chr5:12114", 
"chr5:12118", "0N", "0N", "1N", "0N", "0N", "00", "00", "00",
"11", "10", "00", "00", "1N", "0N", "00"), .Dim = c(5L, 4L), .Dimnames = list(
c("34", "35", "36", "37", "38"), c("key", "AMLM12001KP",
"AMAS-11.3-Diagnostic", "AMLM12014N-R")))

key.table<- structure(c("chr5:12111", "chr5:12111", "chr5:12113", "chr5:12114",
"chr5:12118", "chr5:12122", "chr5:12123", "chr5:12123", "chr5:12125",
"chr5:12127", "chr5:12129", "9920068", "9920069", "9920070",
"9920071", "9920072", "9920073", "9920074", "9920075", "9920076",
"9920077", "9920078"), .Dim = c(11L, 2L), .Dimnames = list(c("34",
"35", "36", "37", "38", "39", "40", "41", "42", "43", "44"),
c("key", "variantId")))

结果
  key          variantId    matched.extract
34 "chr5:12111" "9920068" NA
35 "chr5:12111" "9920069" NA
36 "chr5:12113" "9920070" AMLM12001KP (1N),AMLM12014N-R (1N)
37 "chr5:12114" "9920071" AMAS-11.3-Diagnostic (11)
38 "chr5:12118" "9920072" AMAS-11.3-Diagnostic (10)
39 "chr5:12122" "9920073" NA
40 "chr5:12123" "9920074" NA
41 "chr5:12123" "9920075" NA
42 "chr5:12125" "9920076" NA
43 "chr5:12127" "9920077" NA
44 "chr5:12129" "9920078" NA

最佳答案

使用 ,我会这样处理它:

library(data.table)
# convert the 'key.table' matrix to a data.table
kt <- as.data.table(key.table, keep.rownames=TRUE)
# convert the 'mymat' matrix to a data.table and melt into long format
# filter on the needed geno-types
# paste the needed values together into the requested format
mm <- melt(as.data.table(mymat, keep.rownames=TRUE),
id=c("rn","key"))[value %in% c("1N","11","10"), val := paste0(variable," (",value,")")
][, .(val = paste(val[!is.na(val)], collapse = ",")), by = .(rn,key)
][val=="", val:=NA]
# join the 'mm' and 'kt' data.tables
kt[mm, matched := val, on=c("rn","key")]

这使:

> kt
rn key variantId matched
1: 34 chr5:12111 9920068 NA
2: 35 chr5:12111 9920069 NA
3: 36 chr5:12113 9920070 AMLM12001KP (1N),AMLM12014N-R (1N)
4: 37 chr5:12114 9920071 AMAS-11.3-Diagnostic (11)
5: 38 chr5:12118 9920072 AMAS-11.3-Diagnostic (10)
6: 39 chr5:12122 9920073 NA
7: 40 chr5:12123 9920074 NA
8: 41 chr5:12123 9920075 NA
9: 42 chr5:12125 9920076 NA
10: 43 chr5:12127 9920077 NA
11: 44 chr5:12129 9920078 NA


解释:
  • kt <- as.data.table(key.table, keep.rownames=TRUE)将转换矩阵 key.tabledata.table (这是一个增强的 data.frame )并将行名存储在 rn 中柱子。
  • mm <- melt(as.data.table(mymat, keep.rownames=TRUE), id=c("rn","key"))将转换矩阵 mymatdata.table ,将行名存储在 rn 中列并将 data.table 融合为长格式。
  • 零件[value %in% c("1N","11","10"), val := paste0(variable," (",value,")")]将粘贴 variable -values(它们是 mymat 中的列名)与 value -values 仅在 value 的情况下是 1N , 1110 .
  • 零件[, .(val = paste(val[!is.na(val)], collapse = ",")), by = .(rn,key)]将粘贴非 NAval一起由 rn & key变量。
  • 零件[val=="", val:=NA]将转换 val 的空行进入 NA -值
  • 终于kt[mm, matched := val, on=c("rn","key")]更新 kt -data.table 通过引用与 val -值 mm -data.table 用于匹配 rn & key变量。


  • 警告 : 使用data.table时最好不要使用 key作为 key 的变量名也是 data.table 中的一个参数.见 ?key了解更多信息。

    关于r - 如何匹配 2 个数据框列并提取列值和列名?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34411422/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com