gpt4 book ai didi

unix - 比较两个文件中的列,如果匹配则更改另一列中的字符串

转载 作者:行者123 更新时间:2023-12-04 12:31:06 25 4
gpt4 key购买 nike

我有两个文件

file1 
non-coding X FlyBase gene 20025099 20025170 . + . gene_id "FBgn0052826"; gene_symbol "tRNA:Pro-CGG-1-1";
non-coding X FlyBase gene 19910168 19910521 . - . gene_id "FBgn0052821"; gene_symbol "CR32821";
non-coding X FlyBase gene 476857 479309 . - . gene_id "FBgn0029523"; gene_symbol "CR18275";
non-coding X FlyBase gene 15576355 15576964 . + . gene_id "FBgn0262163"; gene_symbol "betaNACtes5";
non-coding X FlyBase gene 19910168 19910521 . - . gene_id "FBgn0052821"; gene_symbol "CR32821";

file2
betaNACtes5
CR18275
28SrRNA-Psi:CR45859
CR32821

我想要的:如果 file2 中的任何行与 file1 的第 13 列(部分匹配,因为“”)匹配,我想将第 4 列中的字符串更改为“pseudogene”,否则不应执行任何操作。

Desired output

non-coding X FlyBase gene 20025099 20025170 . + . gene_id "FBgn0052826"; gene_symbol "tRNA:Pro-CGG-1-1";
non-coding X FlyBase pseudogene 19910168 19910521 . - . gene_id "FBgn0052821"; gene_symbol "CR32821";
non-coding X FlyBase gene 476857 479309 . - . gene_id "FBgn0029523"; gene_symbol "CR18275";
non-coding X FlyBase pseudogene 15576355 15576964 . + . gene_id "FBgn0262163"; gene_symbol "betaNACtes5";
non-coding X FlyBase pseudogene 19910168 19910521 . - . gene_id "FBgn0052821"; gene_symbol "CR32821";

到目前为止我可以得到比赛,但我不能做剩下的事情。

grep -Ff file2 file1

最佳答案

使用您展示的示例,请尝试遵循 awk 代码。这也将保留 Input_file1 中存在的空格。

awk '
BEGIN{ s1="\"" }
FNR==NR{
arr[s1 $0 s1";"]
next
}
{
match($0,/^([^[:space:]]+[[:space:]]+){3}/)
firstPart=substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
match($0,/^[^ ]+/)
restPart=substr($0,RSTART+RLENGTH)
print firstPart ($NF in arr?"pseudogene":substr($0,RSTART,RLENGTH)) restPart
}
' file2 file1

说明: 为以上添加详细说明。

awk '                                          ##Starting awk program from here.
BEGIN{ s1="\"" } ##Setting s1 to " in BEGIN section.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file2 is being read.
arr[s1 $0 s1";"] ##Creating arr array with index of s1 current line s1 semi colon here.
next ##next will skip all further statements from here.
}
{
match($0,/^([^[:space:]]+[[:space:]]+){3}/) ##using match function to match 1st 3 fields here.
firstPart=substr($0,RSTART,RLENGTH) ##Saving matched part into firstPart to be used later on.
$0=substr($0,RSTART+RLENGTH) ##Saving rest of the matched line into current line.
match($0,/^[^ ]+/) ##matching everything from starting till 1st space in current line to get 4th field and rest of line value here.
restPart=substr($0,RSTART+RLENGTH) ##Creating restpart variable which has everything after 4th field value here.
print firstPart ($NF in arr?"pseudogene":substr($0,RSTART,RLENGTH)) restPart ##Printing firstPart then pseudogene OR 4th field and restPart as per need.
}
' file2 file1 ##Mentioning Input_file names here.

关于unix - 比较两个文件中的列,如果匹配则更改另一列中的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69047166/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com