arrays - Perl 检查数组中是否存在以单词开头的行，并将匹配的值返回给变量-6ren

arrays - Perl 检查数组中是否存在以单词开头的行，并将匹配的值返回给变量

转载作者：行者123 更新时间：2023-12-04 18:36:17

31

4

我检查了以下主题 Perl check a line contains at list one word of an array ，但我仍然困惑如何使其更适合我的情况。

我使用上面主题中的示例。

我有一个数组，名为@exampleWords:

my @exampleWords = ("balloon", "space", "monkey", "fruit" );

我有一行包含一个句子，例如:

my $line = "monkey space always unlimited";

如何检查 $line 是否以数组中的匹配单词开头，并将匹配的单词返回到变量中？

在上面的例子中，匹配的单词是“monkey”。

我目前的解决方案是:循环数组中的每个单词并检查 $line 是否以 $word 开头。

my $matchWord = "";
foreach my $word(@exampleWords) {
  if ($line =~ /^$word/) {
    $matchWord = $word;
    last;
  }
}

我仍在寻找更有效的解决方案..

谢谢...

最佳答案

原则上，您必须迭代可能的单词来匹配。但是，您还可以使用它们构建交替正则表达式模式，以便正则表达式引擎启动一次，这与每次迭代启动的循环不同。此外，现在迭代是通过高度优化的 C 代码进行的。

这些比较如何？让我们使用核心模块 Benchmark 对它们进行基准测试.

对于一个小数组，在其中间匹配(您的示例)

use warnings;
use strict;

use Benchmark qw( cmpthese );

my @ary = ("balloon", "space", "monkey", "fruit");
my $line = "monkey space always unlimited";

sub regex {
    my ($line, @ary) = @_;
    my $match; 
    my $re = join '|', map { quotemeta } @ary;
    if ($line =~ /^($re)/) {
        $match = $1;
    }   
    return $match;
}   

sub loop {
    my ($line, @ary) = @_;
    my $match; 
    foreach my $word (@ary) {
        if ($line =~ /^$word/) {  # see note at end
            $match = $word;
            last;
        }   
    }   
    return $match;
}   

cmpthese(-10, {
    regex => sub { regex ($line, @ary) },
    loop  => sub { loop  ($line, @ary) },
});

这会在一台装有 v5.16 的非常好的机器上和一台装有 v5.10 的旧机器上产生

          Rate  loop regexloop  222791/s    --  -70%regex 742962/s  233%    --

Thus regex is way more efficient.

For a 40 times larger array, matching around the middle

I build this array by @ary = qw(...) x 20, then add a word ('AHA'), then repeat 20 more times. I prepend that very word to the string, so that's what gets matched. I make the string much larger, too, even though this shouldn't matter for matching.

In this case the regex is even more convincing

         Rate  loop regexloop   9300/s    --  -82%regex 50873/s  447%    --

and yet more so with v5.10 on the older machine, with 574%.

On v5.27.2 the regex is faster by 1188%, so by a clean order of magnitude. But it is the rate of the loop that drops to only 6723/s, against the above 9330/s. So this only shows that the regex "startup" is more expensive in newer Perls, thus the loop falls further behind.

For the same large array, with the match word near its beginning

I move the match-word AHA in the array right past the original 4-word list

         Rate  loop regexloop  36710/s    --   -3%regex 37666/s    3%    --

So the match needs to happen very, very early so that the loop catches up with the regex. While this can happen often in specific use cases it cannot be expected in general, of course.

Note that the regex had far less work to do as well. Thus it's clear that the loop's problem is that it starts the regex engine anew for every iteration. Here it only had to do it a few times and the regex's advantage all but evaporated, even though it also matched much sooner.

As for programmer's efficiency, take your pick. There are yet other ways using higher level libraries so that you don't have to write the loop. For instance, using core List::Util

use List::Util qw(first);

my $match = first { $line =~ /^$_/ } @ary;

此基准与添加时的循环相同，但慢 10% 左右。

关于问题中使用的正则表达式的注释。

如果 $line 中的第一个单词是 puppy正则表达式 /^$word/将与 pup 匹配。这可能是有意的，也可能不是有意的(但可以将 flu 视为 fluent)，但如果不是，可以通过添加单词边界 anchor \b 来修复。 ,

$line =~ /^$word\b/

同样可以与交替模式一起使用，交替模式是为了模仿问题中的代码而编写的。因此添加单词边界 anchor ，即 /^($re)\b/ .

另一种方法是按单词长度对列表进行排序，sort { length $b <=> length $a } @ary ，每 Borodin的评论。这可能会以更复杂的方式影响问题，请考虑。

关于arrays - Perl 检查数组中是否存在以单词开头的行，并将匹配的值返回给变量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46461198/

31

4

0

文章推荐： terminology - 'progressive enhancement' 应用程序的反面是什么？

文章推荐： sql - 尝试对 SQL Server 中的两列求和导致错误消息

awk - 如果行与“foo”匹配，线上方与“bar”匹配，线下方与“baz”匹配，则删除行？
使用sed和/或awk，仅在行包含字符串“ foo”并且行之前和之后的行分别包含字符串“ bar”和“ baz”时，我才希望删除行。因此，对于此输入： blah blah foo blah bar
c# - 如何按 X% 匹配 2 个字符串(即 >90% 匹配)
例如: S1: "some filename contains few words.txt" S2:“一些文件名包含几个单词 - draft.txt” S3:“一些文件名包含几个单词 - 另一个 dr
R 合并数据帧，允许不精确的 ID 匹配(例如，附加字符 1234 匹配 ab1234)
我正在尝试处理一些非常困惑的数据。我需要通过样本 ID 合并两个包含不同类型数据的大数据框。问题是一张表的样本 ID 有许多不同的格式，但大多数都包含用于匹配其 ID 中某处所需的 ID 字符串，例如
css - 匹配 col-md 时显示 div，匹配 col-sm 时不显示
我想在匹配特定屏幕尺寸时显示特定图像。在这种情况下，对于 Bootstrap ，我使用 col-xx-## 作为我的选择。但似乎它并没有真正按照我认为应该的方式工作。基本思路，我想显示一种全屏图像，
apache - mod_rewrite 问题 : RewriteCond %{REQUEST_FILENAME} ! -f 匹配，即使 REQUEST_FILENAME 不应(完全)匹配
出于某种原因，这条规则 RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^(.*
F# 匹配 ->
我想做类似的东西(Nemerle 语法) def something = match(STT) | 1 with st= "Summ" | 2 with st= "AVG" =>
JavaScript 匹配
假设这是我的代码 var str="abc=1234587;abc=19855284;abc=1234587;abc=19855284;abc=1234587;abc=19855284;abc=123
JavaScript 匹配
我怎样才能得到这个字符串的数字:'(31.5393701, -82.46235569999999)' 我已经在尝试了，但这离解决方案还很远:) text.match(/$(\d+),(\d+)$/
JavaScript 匹配
如何去除输出中的逗号 (,)？有没有更好的方法从字符串或句子中搜索 url。 alert(" http://www.cnn.com df".match(/https?:\/\/([-\w\.]+
Python - 匹配
a = ('one', 'two') b = ('ten', 'ten') z = [('four', 'five', 'six'), ('one', 'two', 'twenty')] 我正在尝试
vba - 循环遍历行和列时的索引/匹配
我已经编写了以下代码，我希望用它来查找从第 21 列到另一张表中最后一行的值，并根据这张表中 A 列和另一张表中 B 列中的值将它们返回到这张表床单。当我使用下面的代码时，我得到一个工作表错误。你能
Excel 匹配 IF 语句未正确评估
我在以下结构中有两列 A B 1 49 4922039670 我已经能够评估 =LEN(A1)如2 , =LEFT(B1,2)如49 , 和 =LEFT(B1,LEN(A1)
基于行首的 Vim 匹配
我有一个文件，其中一行可以以 + 开头, -或 * .在其中一些行之间可以有以字母或数字(一般文本)开头的行(也包含这些字符，但不在第 1 列中!)。知道这一点，设置匹配和突出显示机制的最简单方法是
正则表达式:匹配，但如果在评论中则不匹配
我有一个数据字段文件，其中可能包含注释，如下所示: id, data, data, data 101 a, b, c 102 d, e, f 103 g, h, i // has to do with
匹配 url 的正则表达式模式
我有以下模式:/^\/(?P.+)$/匹配:/url . 我的问题是它也匹配 /url/page ，如何忽略/在这个正则表达式中？该模式应该: 模式匹配:/url 模式不匹配:/url/page 提
r - R中多维度的聚类/匹配
我有一个非常庞大且复杂的数据集，其中包含许多对公司的观察。公司的一些观察是多余的，我需要制作一个键来将多余的观察映射到一个单独的观察。然而，判断他们是否真的代表同一家公司的唯一方法是通过各种变量的相似
xpath 匹配 - 查找值不在值集中的标签是否存在
我有以下 XML A B C 我想查找 if not(exists(//Record/subRecord
javascript - 匹配/不匹配的正则表达式上没有出现警报框？
我制作了一个正则表达式来验证潜在的比特币地址，现在当我单击报价按钮时，我希望根据正则表达式检查表单中输入的值，但它不起作用。 https://jsfiddle.net/arkqdc8a/5/ var
sql - 检查支架是否平衡/匹配
我有一些 MS Word 文档，我已将其全部内容转移到 SQL 表中。内容包含多个方括号和大括号，例如 [{a} as at [b],] {c,} {d,} etc 我需要进行检查以确保括号平衡/匹
JavaScript Unicode 匹配
我正在使用 Node.js 从 XML 文件读取数据。但是当我尝试将文件中的数据与文字进行比较时，它不匹配，即使它看起来相同: const parser: xml2js.Parser = new

首页

博学

6Ren·AI

商城

arrays - Perl 检查数组中是否存在以单词开头的行，并将匹配的值返回给变量