gpt4 book ai didi

awk - 在列中找到确切的字符串

转载 作者:行者123 更新时间:2023-12-05 09:27:01 25 4
gpt4 key购买 nike

我想在一列中找到特定的字符串和字符串组合。你能帮帮我吗?

输入:

benign,likely_pathogenic
benign,likely_pathogenic
benign,conflicting_interpretations_of_pathogenicity
benign,conflicting_interpretations_of_pathogenicity
benign,conflicting_interpretations_of_pathogenicity
risk_factor,uncertain_significance,likely_pathogenic,uncertain_significance,_other,benign
risk_factor,uncertain_significance,likely_pathogenic,uncertain_significance,_other,benign
risk_factor,benign,likely_benign,drug_response,not_provided,uncertain_significance,pathogenic,uncertain_significance,_other,conflicting_interpretations_of_pathogenicity
pathogenic,not_provided,benign,likely_pathogenic,likely_benign,risk_factor
likely_benign,conflicting_interpretations_of_pathogenicity
benign,likely_benign,conflicting_interpretations_of_pathogenicity
benign,likely_pathogenic
uncertain_significance,likely_benign,conflicting_interpretations_of_pathogenicity
benign,likely_pathogenic
conflicting_interpretations_of_pathogenicity,_other,benign,pathogenic,likely_benign,conflicting_interpretations_of_pathogenicity
conflicting_interpretations_of_pathogenicity,_other,benign,pathogenic,likely_benign,conflicting_interpretations_of_pathogenicity
risk_factor,benign,likely_benign,drug_response,not_provided,uncertain_significance,pathogenic,uncertain_significance,_other,conflicting_interpretations_of_pathogenicity
pathogenic,likely_pathogenic
uncertain_significance,conflicting_interpretations_of_pathogenicity,likely_benign
benign,conflicting_interpretations_of_pathogenicity
benign,conflicting_interpretations_of_pathogenicity
benign,conflicting_interpretations_of_pathogenicity
pathogenic

输出:

benign,likely_pathogenic
benign,likely_pathogenic
risk_factor,uncertain_significance,likely_pathogenic,uncertain_significance,_other,benign
risk_factor,uncertain_significance,likely_pathogenic,uncertain_significance,_other,benign
risk_factor,benign,likely_benign,drug_response,not_provided,uncertain_significance,pathogenic,uncertain_significance,_other,conflicting_interpretations_of_pathogenicity
pathogenic,not_provided,benign,likely_pathogenic,likely_benign,risk_factor
benign,likely_pathogenic
benign,likely_pathogenic
conflicting_interpretations_of_pathogenicity,_other,benign,pathogenic,likely_benign,conflicting_interpretations_of_pathogenicity
conflicting_interpretations_of_pathogenicity,_other,benign,pathogenic,likely_benign,conflicting_interpretations_of_pathogenicity
pathogenic,likely_pathogenic
pathogenic

我想将包含致病性和可能致病性的每一列分开。但部分字符串 pathogenic 是 conflicting_interpretations_of_pathogenicity。我试过了

awk -F'\t' -v OFS="\t" '{if($14=="pathogenic") print FILENAME,$0; else if($14=="likely_pathogenic") print FILENAME,$0}' 

但它是针对列中的确切字符串

如果我尝试过:

awk -F'\t' -v OFS="\t" '{if($14~"pathogenic") print FILENAME,$0}'

我得到所有具有 pathogenic、likely_pathogenic 和 conflicting_interpretations_of_pathogenicity 的行。在一行中可能是相互矛盾的...和致病性或可能致病性的组合。

最佳答案

可能是这样的:

awk '{
split($0,a,/,/) # split NEEDED field on commas
for(i in a) # check each part
if(a[i]~/^(likely_)?pathogenic$/) { # if matches this regex
print # output
break # no need for more matches
}
}' file

一些输出:

benign,likely_pathogenic
benign,likely_pathogenic
risk_factor,uncertain_significance,likely_pathogenic,uncertain_significance,_other,benign
...

显然,您需要添加 FS 等,因为您正在处理 NF==14 的示例代码。

编辑:

我想这也适用于发布的样本数据:

$ awk '/(^|,)(likely_)?pathogenic(,|$)/' file

或您假设的数据:

$ awk '$14~/(^|,)(likely_)?pathogenic(,|$)/' file

关于awk - 在列中找到确切的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73053519/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com