gpt4 book ai didi

bash - Grep 匹配前每行中的所有字符

转载 作者:行者123 更新时间:2023-11-29 09:27:04 26 4
gpt4 key购买 nike

我有一个包含数万个制表符分隔行的文件,如下所示:

cluster11586    TRINITY_DN135758_c4_g1_i1   5'-adenylylsulfate reductase-like 4 9.10921
cluster41208 TRINITY_DN130890_c2_g1_i1 Anthranilate phosphoribosyltransferase, chloroplastic 18.5398
cluster26862 TRINITY_DN132510_c1_g1_i2 ATP synthase subunit alpha, mitochondrial 4.82626
cluster13001 TRINITY_DN130890_c4_g1_i3 Phosphopantetheine adenylyltransferase 2.58108

我想使用 grep/awk/sed 生成一个文件,其中包含前两列之后和最终十进制数之前的文本,删除制表符并将空格替换为下划线:

5'-adenylylsulfate_reductase-like_4
Anthranilate_phosphoribosyltransferase,_chloroplastic
ATP_synthase_subunit_alpha,_mitochondrial
Phosphopantetheine_adenylyltransferase

我想提取最终十进制数之前的所有内容,我可以将其与 [0-9]+\.[0-9]+$ 匹配,然后将结果传递给类似的东西awk '{$1=$2="";打印 $0}' 以删除前两列(希望还有下面的选项卡),然后将其发送到 sed -e 's//_/g' 但如何才能提取每行中最后一个十进制数之前的文本,而不获取十进制数本身或前面的空格? awk 似乎在删除前两列后离开了选项卡。我可以在不输出中间文件的情况下完成所有这些吗?

最佳答案

了解这一点将使您更好地了解 awk 如何使用字段和字段分隔符来拆分和重组记录:

$ awk '{$1=$2=$NF=""; $0=$0; OFS="_"; $1=$1; OFS=FS} 1' file
5'-adenylylsulfate_reductase-like_4
Anthranilate_phosphoribosyltransferase,_chloroplastic
ATP_synthase_subunit_alpha,_mitochondrial
Phosphopantetheine_adenylyltransferase

步骤:

$ awk '{$1=$2=$NF=""; print "<" $0 ":" $1 ">"}' file
< 5'-adenylylsulfate reductase-like 4 :>
< Anthranilate phosphoribosyltransferase, chloroplastic :>
< ATP synthase subunit alpha, mitochondrial :>
< Phosphopantetheine adenylyltransferase :>

$ awk '{$1=$2=$NF=""; $0=$0; print "<" $0 ":" $1 ">"}' file
< 5'-adenylylsulfate reductase-like 4 :5'-adenylylsulfate>
< Anthranilate phosphoribosyltransferase, chloroplastic :Anthranilate>
< ATP synthase subunit alpha, mitochondrial :ATP>
< Phosphopantetheine adenylyltransferase :Phosphopantetheine>

$ awk '{$1=$2=$NF=""; $0=$0; $1=$1; print "<" $0 ":" $1 ">"}' file
<5'-adenylylsulfate reductase-like 4:5'-adenylylsulfate>
<Anthranilate phosphoribosyltransferase, chloroplastic:Anthranilate>
<ATP synthase subunit alpha, mitochondrial:ATP>
<Phosphopantetheine adenylyltransferase:Phosphopantetheine>

$ awk '{$1=$2=$NF=""; $0=$0; OFS="_"; $1=$1; OFS=FS; print "<" $0 ":" $1 ">"}' file
<5'-adenylylsulfate_reductase-like_4:5'-adenylylsulfate>
<Anthranilate_phosphoribosyltransferase,_chloroplastic:Anthranilate>
<ATP_synthase_subunit_alpha,_mitochondrial:ATP>
<Phosphopantetheine_adenylyltransferase:Phosphopantetheine>

关于bash - Grep 匹配前每行中的所有字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53584984/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com