gpt4 book ai didi

regex - 无法理解多行正则表达式 qr/( . $ .+ )/xms 的结果

转载 作者:行者123 更新时间:2023-12-04 01:06:08 26 4
gpt4 key购买 nike

对于字符串“aa\nbb\ncc”,我想从左边的最后一个字母到第一个换行符(“a”)到多行字符串的末尾进行匹配,并期望
"aa\nbb\ncc" =~ qr/( . $ .+ )/xms匹配 a\nbb\ncc

"aa\nbb\ncc\n" =~ qr/( . $ .+ )/xms匹配 a\nbb\ncc\n .

但我没有匹配到 "aa\nbb\ncc" =~ qr/( . $ .+ )/xms并匹配 c\n"aa\nbb\ncc" =~ qr/( . $ .+ )/xms .

使用 qr/( . $ ..+ )/xms我得到了预期的结果(参见示例代码)。

Perl 版本 5.14.2。

任何人都可以解释这种行为吗?

perldoc perlre:

   m   Treat string as multiple lines.  That is, change "^" and "$" 
from matching the start or end of the string to matching the start
or end of any line anywhere within the string.

s Treat string as single line. That is, change "." to match any character
whatsoever, even a newline, which normally it would not match.

Used together, as "/ms", they let the "." match any character whatsoever,
while still allowing "^" and "$" to match, respectively, just after and
just before ewlines within the string.

\z Match only at end of string

运行以下示例代码:
#!/usr/bin/env perl

use strict;
use warnings;

print "Multiline string : ", '"aa\nbb\ncc"', "\n\n";
my $str = "aa\nbb\ncc";

print_match($str, qr/( . $ )/xms); # matches "a"
print_match($str, qr/( . $ . )/xms); # matches "a\n"
print_match($str, qr/( . $ .. )/xms); # matches "a\nb"
print_match($str, qr/( . $ ..+ )/xms); # matches "a\nbb\ncc"
print_match($str, qr/( . $ .+ )/xms); # NO MATCH ! Why ???
print_match($str, qr/( . $ .+ \z )/xms); # NO MATCH ! Why ???

print "\nMultiline string now with terminating newline : ", '"aa\nbb\ncc\n"', "\n\n";
$str = "aa\nbb\ncc\n";

print_match($str, qr/( . $ )/xms); # matches "a"
print_match($str, qr/( . $ . )/xms); # matches "a\n"
print_match($str, qr/( . $ .. )/xms); # matches "a\nb"
print_match($str, qr/( . $ ..+ )/xms); # matches "a\nbb\ncc\n"
print_match($str, qr/( . $ .+ )/xms); # MATCHES "c\n" ! Why ???
print_match($str, qr/( . $ .+ \z)/xms); # MATCHES "c\n" ! Why ???

sub print_match {
my ($str, $regex) = @_;
$str =~ $regex;
if ( $1 ) {
printf "--> %-20s matched : >%s< \n", $regex, $1;
}
else {
printf "--> %-20s : no match !\n", $regex;
}
}

输出是:
Multiline string : "aa\nbb\ncc"

--> (?^msx:( . $ )) matched : >a<
--> (?^msx:( . $ . )) matched : >a
<
--> (?^msx:( . $ .. )) matched : >a
b<
--> (?^msx:( . $ ..+ )) matched : >a
bb
cc<
--> (?^msx:( . $ .+ )) : no match !

Multiline string now with terminating newline : "aa\nbb\ncc\n"

--> (?^msx:( . $ )) matched : >a<
--> (?^msx:( . $ . )) matched : >a
<
--> (?^msx:( . $ .. )) matched : >a
b<
--> (?^msx:( . $ ..+ )) matched : >a
bb
cc
<
--> (?^msx:( . $ .+ )) matched : >c
<

最佳答案

这是一个错误。请通过运行 perlbug 报告它命令行也是。

$ perl -E'say "aa\nbb\ncc" =~ qr/( . $ .+ )/xms ? ">$1<" : 0'
0

$ perl -E'say "aa\nbb\ncc\n" =~ qr/( . $ .+ )/xms ? ">$1<" : 0'
>c
<

$ perl -v
...
This is perl 5, version 16, subversion 0 (v5.16.0) built for x86_64-linux
...

就像你说的,他们应该匹配 "a\nbb\ncc""a\nbb\ncc\n"分别。有与 $相关的优化.其中一名似乎未能接受 /ms考虑到。

PS — 您可能对 use re 'debug'; 感兴趣.
$ perl -Mre=debug -E'say "aa\nbb\ncc" =~ qr/( . $ .+ )/xms ? ">$1<" : 0'
Compiling REx "( . $ .+ )"
Final program:
1: OPEN1 (3)
3: SANY (4)
4: MEOL (5)
5: PLUS (7)
6: SANY (0)
7: CLOSE1 (9)
9: END (0)
anchored ""$ at 2 minlen 2
Matching REx "( . $ .+ )" against "aa%nbb%ncc"
0 <> <aa%nbb%ncc> | 1:OPEN1(3)
0 <> <aa%nbb%ncc> | 3:SANY(4)
1 <a> <a%nbb%ncc> | 4:MEOL(5)
failed...
3 <aa%n> <bb%ncc> | 1:OPEN1(3)
3 <aa%n> <bb%ncc> | 3:SANY(4)
4 <aa%nb> <b%ncc> | 4:MEOL(5)
failed...
6 <aa%nbb%n> <cc> | 1:OPEN1(3)
6 <aa%nbb%n> <cc> | 3:SANY(4)
7 <aa%nbb%nc> <c> | 4:MEOL(5)
failed...
Match failed
0
Freeing REx: "( . $ .+ )"

关于regex - 无法理解多行正则表达式 qr/( . $ .+ )/xms 的结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11412439/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com