gpt4 book ai didi

regex - 匹配空格但不匹配换行符

转载 作者:行者123 更新时间:2023-12-04 22:19:43 25 4
gpt4 key购买 nike

有时我想匹配空格而不是换行符。

到目前为止,我一直求助于 [\t]。有没有比较尴尬的方法?

最佳答案

使用双重否定:

/[^\S\r\n]/

也就是说,not-not-whitespace(大写 S 的补充)或 not-carriage-return 或 not-newline。将外层的not(,字符类中的补充^)分配给De Morgan's law,这相当于“空白但不回车或换行”。在模式中包含 \r\n 可以正确处理所有 Unix (LF)、经典 Mac OS (CR) 和 DOS-ish (CR LF) newline conventions

无需相信我的话:

#! /usr/bin/env perl

use strict;
use warnings;

use 5.005; # for qr//

my $ws_not_crlf = qr/[^\S\r\n]/;

for (' ', '\f', '\t', '\r', '\n') {
my $qq = qq["$_"];
printf "%-4s => %s\n", $qq,
(eval $qq) =~ $ws_not_crlf ? "match" : "no match";
}

输出:

" "  => match"\f" => match"\t" => match"\r" => no match"\n" => no match

Note the exclusion of vertical tab, but this is addressed in v5.18.

Before objecting too harshly, the Perl documentation uses the same technique. A footnote in the “Whitespace” section of perlrecharclass reads

Prior to Perl v5.18, \s did not match the vertical tab. [^\S\cK] (obscurely) matches what \s traditionally did.

The same section of perlrecharclass also suggests other approaches that won’t offend language teachers’ opposition to double-negatives.

Outside locale and Unicode rules or when the /a switch is in effect, “\s matches [\t\n\f\r ] and, starting in Perl v5.18, the vertical tab, \cK.” Discard \r and \n to leave /[\t\f\cK ]/ for matching whitespace but not newline.

If your text is Unicode, use code similar to the sub below to construct a pattern from the table in the aforementioned documentation section.

sub ws_not_nl {
local($_) = <<'EOTable';
0x0009 CHARACTER TABULATION h s
0x000a LINE FEED (LF) vs
0x000b LINE TABULATION vs [1]
0x000c FORM FEED (FF) vs
0x000d CARRIAGE RETURN (CR) vs
0x0020 SPACE h s
0x0085 NEXT LINE (NEL) vs [2]
0x00a0 NO-BREAK SPACE h s [2]
0x1680 OGHAM SPACE MARK h s
0x2000 EN QUAD h s
0x2001 EM QUAD h s
0x2002 EN SPACE h s
0x2003 EM SPACE h s
0x2004 THREE-PER-EM SPACE h s
0x2005 FOUR-PER-EM SPACE h s
0x2006 SIX-PER-EM SPACE h s
0x2007 FIGURE SPACE h s
0x2008 PUNCTUATION SPACE h s
0x2009 THIN SPACE h s
0x200a HAIR SPACE h s
0x2028 LINE SEPARATOR vs
0x2029 PARAGRAPH SEPARATOR vs
0x202f NARROW NO-BREAK SPACE h s
0x205f MEDIUM MATHEMATICAL SPACE h s
0x3000 IDEOGRAPHIC SPACE h s
EOTable

my $class;
while (/^0x([0-9a-f]{4})\s+([A-Z\s]+)/mg) {
my($hex,$name) = ($1,$2);
next if $name =~ /\b(?:CR|NL|NEL|SEPARATOR)\b/;
$class .= "\\N{U+$hex}";
}

qr/[$class]/u;
}

其他应用

双否定技巧对于匹配字母字符也很方便。请记住,\w 匹配“单词字符”、字母字符 数字和下划线。我们丑陋的美国人有时想把它写成,比如说,

if (/[A-Za-z]+/) { ... }

但是双重否定字符类可以遵守语言环境:

if (/[^\W\d_]+/) { ... }

用这种方式表达“一个单词字符而不是数字或下划线”有点不透明。 POSIX 字符类更直接地传达意图

if (/[[:alpha:]]+/) { ... }

或建议使用 szbalint 的 Unicode 属性

if (/\p{Letter}+/) { ... }

关于regex - 匹配空格但不匹配换行符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6342080/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com