gpt4 book ai didi

javascript - Dart 支持在正则表达式中使用脚本属性值

转载 作者:行者123 更新时间:2023-11-30 19:19:56 26 4
gpt4 key购买 nike

Unicode regular expression documentation描述了对文本进行复杂的匹配。具体来说,我想知道如何根据代码点的脚本属性值在一串文本中匹配各种脚本。

关于 Using Script Property Values in Regular Expressions 的 Unicode 文档指的是这种可能性:

The script property is useful in regular expression syntax for easy specification of spans of text that consist of a single script or mixture of scripts. In general, regular expressions should use specific Script property values only in conjunction with both Common and Inherited. For example, to distinguish a sequence of characters appropriate for Greek text, one might use

((Greek | Common) (Inherited | Me | Mn))

The preceding expression matches all characters that have a Script property value of Greek or Common and which are optionally followed by characters with a Script property value of Inherited. For completeness, the regular expression also allows any nonspacing or enclosing mark.

Some languages commonly use multiple scripts, so, for example, to distinguish a sequence of characters appropriate for Japanese text one might use:

((Hiragana | Katakana | Han | Latin | Common) (Inherited | Me | Mn))

这是在 Dart 中实现的吗?我没有看到针对 Dart 的描述 RegExJavaScript ECMAScript regex specs Dart 正则表达式所基于的。

最佳答案

早在 2019 年中期,Dart 就在 2.4 版中添加了对 Unicode 属性的支持(请参阅 https://github.com/dart-lang/sdk/issues/34935)。但是,有一个问题:要使其正常工作,您需要将可选参数“unicode: true”传递给 RegExp() 构造函数,以便将您的模式识别为“unicode 模式”。我已经测试了以下内容(匹配 {L} 字母、{N} 数字和 {M} 标记)并且它适用于最新的 Dart SDK:

RegExp(r'[\p{L}\p{N}\p{M}]', unicode: true)

按照@daxim 的示例匹配希腊字符:

RegExp exp = RegExp(r'(\p{Script=Greek})', unicode: true);
Iterable<RegExpMatch> matches;
matches = exp.allMatches('ΓβγΔδΕεζηΘθ');
for (Match m in matches) {
print('${m.group(1)}');
}

关于javascript - Dart 支持在正则表达式中使用脚本属性值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57583651/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com