gpt4 book ai didi

regex - 查找模式和字符直到空格,并将捕获的模式移动到行尾 sed

转载 作者:行者123 更新时间:2023-12-04 17:10:31 27 4
gpt4 key购买 nike

我想找到一个特定的模式(“k__”),以及它后面的任何字符,直到一个空格,然后将捕获的模式移动到行尾

使用这个示例文件:

cat test.file
37099 k__Eukaryota species:s__Isochrysis galbana;genus:g__Isochrysis;family:f__Isochrysidaceae;order:o__Isochrysidales;class:c__Haptophyta;phylum:p__Haptista
73015 k__Eukaryota species:s__Monodus sp. CCMP505;genus:g__Monodus;family:f__Pleurochloridaceae;order:o__Mischococcales;class:c__Xanthophyceae;phylum:p__
73015 k__Eukaryota species:s__Monodus sp. CCMP505;genus:g__Monodus;family:f__Pleurochloridaceae;order:o__Mischococcales;class:c__Xanthophyceae;phylum:p__
73015 k__Eukaryota species:s__Monodus sp. CCMP505;genus:g__Monodus;family:f__Pleurochloridaceae;order:o__Mischococcales;class:c__Xanthophyceae;phylum:p__
73015 k__Eukaryota species:s__Monodus sp. CCMP505;genus:g__Monodus;family:f__Pleurochloridaceae;order:o__Mischococcales;class:c__Xanthophyceae;phylum:p__
73015 k__Eukaryota species:s__Monodus sp. CCMP505;genus:g__Monodus;family:f__Pleurochloridaceae;order:o__Mischococcales;class:c__Xanthophyceae;phylum:p__
43925 k__Eukaryota species:s__Nannochloropsis oculata;genus:g__Nannochloropsis;family:f__Monodopsidaceae;order:o__Eustigmatales;class:c__Eustigmatophyceae;phylum:p__
43925 k__Eukaryota species:s__Nannochloropsis oculata;genus:g__Nannochloropsis;family:f__Monodopsidaceae;order:o__Eustigmatales;class:c__Eustigmatophyceae;phylum:p__
43925 k__Eukaryota species:s__Nannochloropsis oculata;genus:g__Nannochloropsis;family:f__Monodopsidaceae;order:o__Eustigmatales;class:c__Eustigmatophyceae;phylum:p__
43925 k__Bacteria species:s__Nannochloropsis oculata;genus:g__Nannochloropsis;family:f__Monodopsidaceae;order:o__Eustigmatales;class:c__Eustigmatophyceae;phylum:p__

所以,我想匹配“k__Eukaryota”和“k__Bacteria”(以及其他以 k__ 开头的模式),然后将捕获的匹配项移到行尾:例如期望的输出=

37099    species:s__Isochrysis galbana;genus:g__Isochrysis;family:f__Isochrysidaceae;order:o__Isochrysidales;class:c__Haptophyta;phylum:p__Haptista k__Eukaryota
73015 species:s__Monodus sp. CCMP505;genus:g__Monodus;family:f__Pleurochloridaceae;order:o__Mischococcales;class:c__Xanthophyceae;phylum:p__ k__Eukaryota
73015 species:s__Monodus sp. CCMP505;genus:g__Monodus;family:f__Pleurochloridaceae;order:o__Mischococcales;class:c__Xanthophyceae;phylum:p__ k__Eukaryota
73015 species:s__Monodus sp. CCMP505;genus:g__Monodus;family:f__Pleurochloridaceae;order:o__Mischococcales;class:c__Xanthophyceae;phylum:p__ k__Eukaryota
73015 species:s__Monodus sp. CCMP505;genus:g__Monodus;family:f__Pleurochloridaceae;order:o__Mischococcales;class:c__Xanthophyceae;phylum:p__ k__Eukaryota
73015 species:s__Monodus sp. CCMP505;genus:g__Monodus;family:f__Pleurochloridaceae;order:o__Mischococcales;class:c__Xanthophyceae;phylum:p__ k__Eukaryota
43925 species:s__Nannochloropsis oculata;genus:g__Nannochloropsis;family:f__Monodopsidaceae;order:o__Eustigmatales;class:c__Eustigmatophyceae;phylum:p__ k__Eukaryota
43925 species:s__Nannochloropsis oculata;genus:g__Nannochloropsis;family:f__Monodopsidaceae;order:o__Eustigmatales;class:c__Eustigmatophyceae;phylum:p__ k__Eukaryota
43925 species:s__Nannochloropsis oculata;genus:g__Nannochloropsis;family:f__Monodopsidaceae;order:o__Eustigmatales;class:c__Eustigmatophyceae;phylum:p__ k__Eukaryota
43925 species:s__Nannochloropsis oculata;genus:g__Nannochloropsis;family:f__Monodopsidaceae;order:o__Eustigmatales;class:c__Eustigmatophyceae;phylum:p__ k__Bacteria

我认为这很容易,但我做不到。这是我尝试过的:

cat test.file | gsed -E 's#(.*k__)(k__\w\+)(.*)#\1\3\2#'

捕获文本直到模式,然后匹配(捕获模式和任何单词字符直到空格)然后捕获到行尾,然后更改捕获组的顺序。

我想我可以反向引用这些模式来更改顺序,但我很可能。没有正确匹配它们。如何捕获到我的模式,模式(“K__xyz”)然后匹配到行尾,捕获这些组并重新组织?这是正确的方法吗?

非常感谢任何帮助!

唱片

最佳答案

如果要编辑原始文件,请添加'-i'选项;
sed -i -r 's/(.*)(k__[^ ]*)( .*)/\1\3\2/g' 测试文件
如果要将结果保存到其他文件,请删除“-i”选项;
sed -r 's/(.*)(k__[^ ]*)( .*)/\1\3\2/g' test.file > new.file

我的测试结果:

szvp000006656:/home # cat test.file
37099 k__Eukaryota species:s__Isochrysis galbana;genus:g__Isochrysis;family:f__Isochrysidaceae;order:o__Isochrysidales;class:c__Haptophyta;phylum:p__Haptista

szvp000006656:/home # sed -r 's/(.*)(k__[^ ]*)( .*)/\1\3 \2/g' test.file > new.file
szvp000006656:/home # cat new.file
37099 species:s__Isochrysis galbana;genus:g__Isochrysis;family:f__Isochrysidaceae;order:o__Isochrysidales;class:c__Haptophyta;phylum:p__Haptista k__Eukaryota

szvp000006656:/home # sed -i -r 's/(.*)(k__[^ ]*)( .*)/\1\3 \2/g' test.file
szvp000006656:/home # cat test.file
37099 species:s__Isochrysis galbana;genus:g__Isochrysis;family:f__Isochrysidaceae;order:o__Isochrysidales;class:c__Haptophyta;phylum:p__Haptista k__Eukaryota

注意:

  1. 推荐使用https://regexr.com/调试常规语法
  2. 基本的和扩展的 Posix/GNU 正则表达式都不能识别非贪婪量词;您需要稍后的正则表达式。试试这个非贪婪的正则表达式 [^/]* 而不是 .*? chaos-stackoverflow

关于regex - 查找模式和字符直到空格,并将捕获的模式移动到行尾 sed,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69548739/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com