gpt4 book ai didi

awk - 根据文件 2 中的列范围从文件 1 中提取行

转载 作者:行者123 更新时间:2023-12-02 19:21:59 27 4
gpt4 key购买 nike

我有两个文件

文件1:

chrom   chromStart      chromEnd   clinSign    geneId  rcvAcc  hgvsCod hgvsProt
chr1 930187 930188 VUS SNV SAMD11 RCV001050361 NM_152486.3:c.106G>A NP_689699.2:p.Ala36Thr
chr1 939398 939446 Benign deletion SAMD11 RCV000948524 NM_152486.2:c.683_706+24delCCCCTCATCACCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATC

文件2:

CHROM   POS  REF  ALT  FILTER     GT     BD    
chr1 1609489 AAC A PASS 0/1 FP
chr1 930188 T G LowGQ 0/1 FP
chr1 939400 TGC T PASS 0/1 FP

我尝试根据 CHROM:POS(第一列和第二列)针对文件 1 中前三列的范围 (chrom:chromStart:ChromEnd) 查询文件 2,然后得到输出

chrom   chromStart      chromEnd     clinSign         geneId  rcvAcc  hgvsCod hgvsProt  CHROM        POS  REF  ALT  FILTER      GT     BD     
chr1 930187 930188 VUS SNV SAMD11 RCV001050361 NM_152486.3:c.106G>A NP_689699.2:p.Ala36Thr chr1 930188 T G LowGQ 0/1 FP
chr1 939398 939446 Benign deletion SAMD11 RCV000948524 NM_152486.2:c.683_706+24delCCCCTCATCACCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATC chr1 939400 TGC T PASS 0/1 FP

到目前为止我已经尝试过了

awk '
NR==FNR{ start[$1] = $2; end[$1] = $3; next }
(FNR==1) || ( ($1 in start) && ($2 >= start[$1]) && ($2 <= end[$1]) )
' file1 file2> test.txt
awk 'FNR == NR { low[$1] = $2; high[$1] = $3; next }
> $2 > low[$1] && $2 < high[$1] { print }' file1 file2 > test.txt

但两者都会产生空文件作为输出

如有任何建议,我们将不胜感激。谢谢

最佳答案

$ cat tst.awk
NR == 1 { hdr = $0 }
NR == FNR {
c = ++cnt[$1]
begs[$1,c] = $2
ends[$1,c] = $3
vals[$1,c] = $0
next
}
FNR == 1 {
print hdr, $0
next
}
{
for (c=1; c<=cnt[$1]; c++) {
beg = begs[$1,c]
end = ends[$1,c]
if ( (beg <= $2) && ($2 <= end) ) {
print vals[$1,c], $0
next
}
}
}

.

$ awk -f tst.awk file1 file2
chrom chromStart chromEnd clinSign geneId rcvAcc hgvsCod hgvsProt CHROM POS REF ALT FILTER GT BD
chr1 930187 930188 VUS SNV SAMD11 RCV001050361 NM_152486.3:c.106G>A NP_689699.2:p.Ala36Thr chr1 930188 T G LowGQ 0/1 FP
chr1 939398 939446 Benign deletion SAMD11 RCV000948524 NM_152486.2:c.683_706+24delCCCCTCATCACCTCCCCAGCCACGGTGAGGACCCACCCTGGCATGATC chr1 939400 TGC T PASS 0/1 FP

如果您可以对给定的 chrom 进行多个范围匹配,则只需删除最后的 next 语句 - 如果始终只有 1 个匹配,那么它只是为了提高效率。

关于awk - 根据文件 2 中的列范围从文件 1 中提取行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62918499/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com