gpt4 book ai didi

正则表达式从字典哈希中简单替换文档(Perl)

转载 作者:行者123 更新时间:2023-12-01 09:33:57 25 4
gpt4 key购买 nike

我需要尽快从大型文档的散列中查找和替换关键字。我厌倦了以下两种方法,一种速度提高了 320%,但我确信我这样做是错误的,并且确信有更好的方法来做到这一点。

我只想替换字典哈希中存在的关键字并保留那些不存在的关键字,这样我就知道它不在字典中。

以下两种方法都扫描两次以查找和替换我认为的。我确信像向前或向后看这样的正则表达式可以更快地优化它。

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark qw(:all);

my %dictionary = (
pollack => "pollard",
polynya => "polyoma",
pomaces => "pomaded",
pomades => "pomatum",
practic => "praetor",
prairie => "praised",
praiser => "praises",
prajnas => "praline",
quakily => "quaking",
qualify => "quality",
quamash => "quangos",
quantal => "quanted",
quantic => "quantum",
);

my $content =qq{
Start this is the text that contains the words to replace. {quantal} A computer {pollack} is a general {pomaces} purpose device {practic} that
can be {quakily} programmed to carry out a set {quantic} of arithmetic or logical operations automatically {quamash}.
Since a {prajnas} sequence of operations can {praiser} be readily changed, the computer {pomades} can solve more than {prairie}
one kind of problem {qualify} {doesNotExist} end.
};

# just duplicate content many times
$content .= $content;

cmpthese(100000, {
replacer_1 => sub {my $text = replacer1($content)},
replacer_2 => sub {my $text = replacer2($content)},
});

print replacer1($content) , "\n--------------------------\n";
print replacer2($content) , "\n--------------------------\n";
exit;

sub replacer1 {
my ($content) = shift;
$content =~ s/\{(.+?)\}/exists $dictionary{$1} ? "[$dictionary{$1}]": "\{$1\}"/gex;
return $content;
}

sub replacer2 {
my ($content) = shift;
my @names = $content =~ /\{(.+?)\}/g;
foreach my $name (@names) {
if (exists $dictionary{$name}) {
$content =~ s/\{$name\}/\[$dictionary{$name}\]/;
}
}
return $content;
}

这是基准测试结果:

              Rate replacer_2 replacer_1
replacer_2 5565/s -- -76%
replacer_1 23397/s 320% --

最佳答案

这是一种更快更紧凑的方法:

sub replacer3 {
my ($content) = shift;
$content =~ s#\{(.+?)\}#"[".($dictionary{$1} // $1)."]"#ge;
return $content;
}

在 Perl 5.8 中,如果您的字典值都不是“false”,则可以使用 || 而不是 //

使用已经包含大括号和方括号的字典还有一些好处:

sub replacer5 {
my ($content) = shift;
our %dict2;
if (!%dict2) {
%dict2 = map { "{".$_."}" => "[".$dictionary{$_}."]" } keys %dictionary
}
$content =~ s#(\{.+?\})#$dict2{$1} || $1#ge;
return $content;
}

基准测试结果:

              Rate replacer_2 replacer_1 replacer_3 replacer_5
replacer_2 2908/s -- -79% -83% -84%
replacer_1 14059/s 383% -- -20% -25%
replacer_3 17513/s 502% 25% -- -7%
replacer_5 18741/s 544% 33% 7% --

关于正则表达式从字典哈希中简单替换文档(Perl),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25097744/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com