gpt4 book ai didi

perl - 根据正则表达式将第二大意见拆分为较小的文件

转载 作者:行者123 更新时间:2023-12-02 06:41:45 24 4
gpt4 key购买 nike

好的,我已经阅读了执行此操作的不同方法,但是我只想检查执行方法是否存在未发现的问题,或者是否存在更好的方法(也许是grep?)。

这是我的工作代码:

#!usr/bin/perl

use strict;
use warnings;

my $chapternumber;
open my $corpus, '<', "/Users/jon/Desktop/chpts/chpt1-8/Lifeprocessed.txt" or die $!;
while (my $sentence = <$corpus>)
{
if ($sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/ )
{
$chapternumber = $1;
$chapternumber =~ s/\./_/;
}

open my $outfile, '>>', "/Users/jon/Desktop/chpts/chpt$chapternumber.txt" or die $!;
print $outfile $sentence;
}

该文件是一本教科书,我用 ~ 1.1 Organisms Have Changed over Billions of Years 1.1.~ 15Intro ...~ F_14表示了新的章节,我希望它是新文件的开头:chpt1_1.txt(或其他chpt15Intro等。)。当我找到下一章的定界符时,这结束了。

1个选项:也许不是像逐行显示那样,而是像这样获取整个块? :
 local $/ = "~";
open...
while...
next unless ($sentenceblock =~ /\~\s([\d+F][\.I_][\d\w]+)\s/);
....

非常感谢。

最佳答案

首先,好东西:

enabled strict and warnings
using 3-arg open and lexical filehandles
checking the return value from open()

但是您的正则表达式完全没有意义。
~ is not "meta" in regexes, so it does not need escaping
. is not "meta" in a character class, so it does not need escaping
[\d+F] is equivalent to [+F\d] (what is the "F" for? + matches a literal plus character in a character class, it does NOT mean "one or more" here
[\.I_] what is the "I" for? What is the underscore for?
[\d\w] is equivalent to [\w] and even to just \w

您的代码将open()调用的次数更多了。

对于使用单个字符,tr ///优于s ///。

希望这将使您走上正确的轨道:
#!/usr/bin/perl
use warnings;
use strict;

my $outfile;
while (<DATA>) {
if ( my($chapternumber) = /^~\s([\d.]+)/) {
$chapternumber =~ tr/./_/;
close $outfile if $outfile;
open $outfile, '>', "chpt$chapternumber.txt"
or die "could not open 'chpt$chapternumber.txt' $!";
}
print {$outfile} $_;
}

__DATA__
~ 1.1 Organisms Have Changed over Billions of Years 1.1
stuff
about changing
organisms
~ 1.2 Chapter One, Part Two 1.2
part two
stuff is here

关于perl - 根据正则表达式将第二大意见拆分为较小的文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6498965/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com