gpt4 book ai didi

java - 英文数字的可扩展正则表达式

转载 作者:搜寻专家 更新时间:2023-10-31 08:30:49 25 4
gpt4 key购买 nike

我正在尝试创建一个正则表达式来识别 English numerals ,例如十九二十一百二十二等等 em>,一直到数百万。我想重用正则表达式的某些部分,所以正则表达式是按部分构造的,如下所示:

// replace <TAG> with the content of the variable
ONE_DIGIT = (?:one|two|three|four|five|six|seven|eight|nine)
TEEN = (?:ten|eleven|twelve|(?:thir|for|fif|six|seven|eigh|nine)teen)
TWO_DIGITS = (?:(?:twen|thir|for|fif|six|seven|eigh|nine)ty(?:\s+<ONE_DIGIT>)?|<TEEN>)
// HUNDREDS, et cetera

我想知道是否有人已经做过同样的事情(并愿意分享),因为这些正则表达式很长,而且它们可能有不应该的东西,或者我可能遗漏的东西。此外,我希望它们尽可能高效,因此我期待任何优化提示。我使用的是 Java 正则表达式引擎,但任何正则表达式风格都是可以接受的。

最佳答案

参见 Perl 的 Lingua::EN::Words2NumsLingua::EN::FindNumber .

特别是 source code for Lingua::EN::FindNumber包含:

# This is from Lingua::EN::Words2Nums, after being thrown through
# Regex::PreSuf
my $numbers =
qr/((?:b(?:akers?dozen|illi(?:ard|on))|centillion|d(?:ecilli(?:ard|on)|ozen|u(?:o(?:decilli(?:ard|on)|vigintillion)|vigintillion))|e(?:ight(?:een|ieth|[yh])?|leven(?:ty(?:first|one))?|s)|f(?:i(?:ft(?:een|ieth|[yh])|rst|ve)|o(?:rt(?:ieth|y)|ur(?:t(?:ieth|[yh]))?))|g(?:oogol(?:plex)?|ross)|hundred|mi(?:l(?:ion|li(?:ard|on))|nus)|n(?:aught|egative|in(?:et(?:ieth|y)|t(?:een|[yh])|e)|o(?:nilli(?:ard|on)|ught|vem(?:dec|vigint)illion))|o(?:ct(?:illi(?:ard|on)|o(?:dec|vigint)illion)|ne)|qu(?:a(?:drilli(?:ard|on)|ttuor(?:decilli(?:ard|on)|vigintillion))|in(?:decilli(?:ard|on)|tilli(?:ard|on)|vigintillion))|s(?:core|e(?:cond|pt(?:en(?:dec|vigint)illion|illi(?:ard|on))|ven(?:t(?:ieth|y))?|x(?:decillion|tilli(?:ard|on)|vigintillion))|ix(?:t(?:ieth|y))?)|t(?:ee?n|h(?:ir(?:t(?:een|ieth|y)|d)|ousand|ree)|r(?:e(?:decilli(?:ard|on)|vigintillion)|i(?:gintillion|lli(?:ard|on)))|w(?:e(?:l(?:fth|ve)|nt(?:ieth|y))|o)|h)|un(?:decilli(?:ard|on)|vigintillion)|vigintillion|zero|s))/i;

Perl's Artistic License为准.

您可以使用 Regex::PreSuf自动提取常见的前缀和后缀:

#!/usr/bin/perl

use strict;
use warnings;

use Regex::PreSuf;

my %singledigit = (
one => 1,
two => 2,
three => 3,
four => 4,
five => 5,
six => 6,
seven => 7,
eight => 8,
nine => 9,
);

my $singledigit = presuf(keys %singledigit);

print $singledigit, "\n";

my $text = "one two three four five six seven eight nine";

$text =~ s/($singledigit)/$singledigit{$1}/g;

print $text, "\n";

输出:

C:\Temp> cvb(?:eight|f(?:ive|our)|nine|one|s(?:even|ix)|t(?:hree|wo))1 2 3 4 5 6 7 8 9

恐怕这之后会变得更难 ;-)

关于java - 英文数字的可扩展正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1269838/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com