gpt4 book ai didi

perl - 如何使用 Perl 提取段落和选定的行?

转载 作者:行者123 更新时间:2023-12-01 08:41:15 24 4
gpt4 key购买 nike

我有一段文字需要:

  1. 提取下的整个段落“Aceview 摘要”部分,直到以“请引用”开头的行(不包括在内)。
  2. 提取以“最接近的人类基因”开头的行。
  3. 将它们存储到包含两个元素的数组中。

文本如下所示(also on pastebin):

  AceView: gene:1700049G17Rik, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView.

<META NAME="title"
CONTENT="
AceView: gene:1700049G17Rik a comprehensive annotation of human, mouse and worm genes with mRNAs or EST">

<META NAME="keywords"
CONTENT="
AceView, genes, Acembly, AceDB, Homo sapiens, Human,
nematode, Worm, Caenorhabditis elegans , WormGenes, WormBase, mouse,
mammal, Arabidopsis, gene, alternative splicing variant, structure,
sequence, DNA, EST, mRNA, cDNA clone, transcript, transcription, genome,
transcriptome, proteome, peptide, GenBank accession, dbest, RefSeq,
LocusLink, non-coding, coding, exon, intron, boundary, exon-intron
junction, donor, acceptor, 3'UTR, 5'UTR, uORF, poly A, poly-A site,
molecular function, protein annotation, isoform, gene family, Pfam,
motif ,Blast, Psort, GO, taxonomy, homolog, cellular compartment,
disease, illness, phenotype, RNA interference, RNAi, knock out mutant
expression, regulation, protein interaction, genetic, map, antisense,
trans-splicing, operon, chromosome, domain, selenocysteine, Start, Met,
Stop, U12, RNA editing, bibliography">
<META NAME="Description"
CONTENT= "
AceView offers a comprehensive annotation of human, mouse and nematode genes
reconstructed by co-alignment and clustering of all publicly available
mRNAs and ESTs on the genome sequence. Our goals are to offer a reliable
up-to-date resource on the genes, their functions, alternative variants,
expression, regulation and interactions, in the hope to stimulate
further validating experiments at the bench
">


<meta name="author"
content="Danielle Thierry-Mieg and Jean Thierry-Mieg,
NCBI/NLM/NIH, mieg@ncbi.nlm.nih.gov">




<!--
var myurl="av.cgi?db=mouse" ;
var db="mouse" ;
var doSwf="s" ;
var classe="gene" ;
//-->

但是,我坚持以下脚本逻辑。实现这一目标的正确方法是什么?

   #!/usr/bin/perl -w

my $INFILE_file_name = $file; # input file name

open ( INFILE, '<', $INFILE_file_name )
or croak "$0 : failed to open input file $INFILE_file_name : $!\n";


my @allsum;

while ( <INFILE> ) {
chomp;

my $line = $_;

my @temp1 = ();
if ( $line =~ /^ AceView summary/ ) {
print "$line\n";
push @temp1, $line;
}
elsif( $line =~ /Please quote/) {
push @allsum, [@temp1];
@temp1 = ();
}
elsif ($line =~ /The closest human gene/) {

push @allsum, $line;
}

}

close ( INFILE ); # close input file
# Do something with @allsum

我需要处理很多这样的文件。

最佳答案

您可以在标量上下文中使用范围运算符来提取整个段落:

while (<INFILE>) {
chomp;
if (/AceView summary/ .. /Please quote/) {
print "$_\n";
}

print "$_\n" if /^The closest human gene/;
}

关于perl - 如何使用 Perl 提取段落和选定的行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2636655/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com