gpt4 book ai didi

perl - 我在哪里可以找到特定 block 的(未)分配的 Unicode 代码点数组?

转载 作者:行者123 更新时间:2023-12-02 02:33:27 24 4
gpt4 key购买 nike

目前,我正在手写这些数组。

例如,Miscellaneous Mathematical Symbols-A block 在哈希中有一个条目,如下所示:

my %symbols = (
...
miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC,
(0x27D0..0x27EF)],
...
)

更简单的“连续”数组

miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF]

不起作用,因为 Unicode block 中有漏洞。例如,0x27CB 处没有任何内容。看看 code chart [PDF].

手工编写这些数组是乏味的、容易出错的,但也有点乐趣。我感觉有人已经用 Perl 解决了这个问题!

最佳答案

也许你想要 Unicode::UCD ?使用它的 charblock 例程来获取任何命名 block 的范围。如果您想获取这些名称,可以使用 charblocks

这个模块实际上只是 Perl 自带的 Unicode 数据库的一个接口(interface),所以如果你必须做一些更高级的事情,你可以看看 lib/5.x.y/unicore/UnicodeData.txt 或同一目录中的各种其他文件以获得您需要的内容。

这是我为创建您的 %symbols 而想出的办法。我遍历了所有 block (尽管在此示例中我跳过了名称中没有“Math”的 block 。我得到了开始和结束代码点并检查分配了哪些代码点。由此,我创建了一个我可以使用的自定义属性检查字符是否在范围内并已分配。

use strict;
use warnings;

digest_blocks();

my $property = 'My::InMiscellaneousMathematicalSymbolsA';

foreach ( 0x27BA..0x27F3 )
{
my $in = chr =~ m/\p{$property}/;

printf "%X is %sin $property\n",
$_, $in ? '' : ' not ';
}


sub digest_blocks {
use Unicode::UCD qw(charblocks);

my $blocks = charblocks();

foreach my $block ( keys %$blocks )
{
next unless $block =~ /Math/; # just to make the output small

my( $start, $stop ) = @{ $blocks->{$block}[0] };

$blocks->{$block} = {
assigned => [ grep { chr =~ /\A\p{Assigned}\z/ } $start .. $stop ],
unassigned => [ grep { chr !~ /\A\p{Assigned}\z/ } $start .. $stop ],
start => $start,
stop => $stop,
name => $block,
};

define_my_property( $blocks->{$block} );
}
}

sub define_my_property {
my $block = shift;

(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is

no strict 'refs';
my $string = join "\n", # can do ranges here too
map { sprintf "%X", $_ }
@{ $block->{assigned} };

*{"My::In$subname"} = sub { $string };
}

如果我要经常这样做,我会使用相同的东西来创建一个 Perl 源文件,其中已经定义了自定义属性,这样我就可以在我的任何工作中立即使用它们。在您更新 Unicode 数据之前,所有数据都不应更改。

sub define_my_property {
my $block = shift;

(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is

no strict 'refs';
my $string = num2range( @{ $block->{assigned} } );

print <<"HERE";
sub My::In$subname {
return <<'CODEPOINTS';
$string
CODEPOINTS
}

HERE
}

# http://www.perlmonks.org/?node_id=87538
sub num2range {
local $_ = join ',' => sort { $a <=> $b } @_;
s/(?<!\d)(\d+)(?:,((??{$++1})))+(?!\d)/$1\t$+/g;
s/(\d+)/ sprintf "%X", $1/eg;
s/,/\n/g;
return $_;
}

这给了我适合 Perl 库的输出:

sub My::InMiscellaneousMathematicalSymbolsA {
return <<'CODEPOINTS';
27C0 27CA
27CC
27D0 27EF
CODEPOINTS
}

sub My::InSupplementalMathematicalOperators {
return <<'CODEPOINTS';
2A00 2AFF
CODEPOINTS
}

sub My::InMathematicalAlphanumericSymbols {
return <<'CODEPOINTS';
1D400 1D454
1D456 1D49C
1D49E 1D49F
1D4A2
1D4A5 1D4A6
1D4A9 1D4AC
1D4AE 1D4B9
1D4BB
1D4BD 1D4C3
1D4C5 1D505
1D507 1D50A
1D50D 1D514
1D516 1D51C
1D51E 1D539
1D53B 1D53E
1D540 1D544
1D546
1D54A 1D550
1D552 1D6A5
1D6A8 1D7CB
1D7CE 1D7FF
CODEPOINTS
}

sub My::InMiscellaneousMathematicalSymbolsB {
return <<'CODEPOINTS';
2980 29FF
CODEPOINTS
}

sub My::InMathematicalOperators {
return <<'CODEPOINTS';
2200 22FF
CODEPOINTS
}

关于perl - 我在哪里可以找到特定 block 的(未)分配的 Unicode 代码点数组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2888319/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com