gpt4 book ai didi

windows - 带有大文本文件的 Perl "out of memory"

转载 作者:可可西里 更新时间:2023-11-01 09:43:10 26 4
gpt4 key购买 nike

我在最新版本的 Strawberry Perl for Windows 下使用以下代码时遇到问题:我想读入目录中的所有文本文件并处理它们的内容。我目前看不到一种逐行处理它们的方法,因为我想对文件内容进行的一些更改会跨越换行符。处理主要涉及删除大块文件(在我下面的示例代码中,它只是一行,但理想情况下我会运行几个类似的正则表达式,每个都从文件中删除内容)

我在大量文件(>10,000)上运行此脚本,它总是因“内存不足!”而崩溃。大于 400 MB 的特定文件上的消息。问题是当我编写一个只处理一个文件的程序时,代码工作正常。

机器有 8 GB RAM,所以我认为物理 RAM 不是问题。

我通读了其他关于内存问题的帖子,但没有找到任何可以帮助我实现目标的内容。

任何人都可以建议我需要更改什么才能使程序运行,即提高内存效率或以某种方式回避问题吗?

use strict;
use warnings;
use Path::Iterator::Rule;
use utf8;

use open ':std', ':encoding(utf-8)';

my $doc_rule = Path::Iterator::Rule->new;
$doc_rule->name('*.txt'); # only process text files
$doc_rule->max_depth(3); # don't recurse deeper than 3 levels
my $doc_it = $doc_rule->iter("C:\Temp\");
while ( my $file = $doc_it->() ) { # go through all documents found
print "Stripping $file\n";

# read in file
open (FH, "<", $file) or die "Can't open $file for read: $!";
my @lines;
while (<FH>) { push (@lines, $_) }; # slurp entire file
close FH or die "Cannot close $file: $!";

my $lines = join("", @lines); # put entire file into one string

$lines =~ s/<DOCUMENT>\n<TYPE>EX-.*?\n<\/DOCUMENT>//gs; #perform the processing

# write out file
open (FH, ">", $file) or die "Can't open $file for write: $!";
print FH $lines; # dump entire file
close FH or die "Cannot close $file: $!";
}

最佳答案

逐行处理文件:

while ( my $file = $doc_it->() ) { # go through all documents found
print "Stripping $file\n";

open (my $infh, "<", $file) or die "Can't open $file for read: $!";
open (my $outfh, ">", $file . ".tmp") or die "Can't open $file.tmp for write: $!";

while (<$infh>) {
if ( /<DOCUMENT>/ ) {
# append the next line to test for TYPE
$_ .= <$infh>;
if (/<TYPE>EX-/) {
# document type is excluded, now loop through
# $infh until the closing tag is found.
while (<$infh>) { last if m|</DOCUMENT>|; }

# jump back to the <$infh> loop to resume
# processing on the next line after </DOCUMENT>
next;
}
# if we've made it this far, the document was not excluded
# fall through to print both lines
}
print $outfh $_;
}

close $outfh or die "Cannot close $file: $!";
close $infh or die "Cannot close $file: $!";
unlink $file;
rename $file.'.tmp', $file;
}

关于windows - 带有大文本文件的 Perl "out of memory",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28329568/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com