gpt4 book ai didi

perl 按名称指定 unicode 字符而不将名称全部大写

转载 作者:行者123 更新时间:2023-12-04 08:37:08 26 4
gpt4 key购买 nike

所以,这是一个装饰点,但是有没有一种简单的方法可以在 Perl 字符串中按其名称插入一个 unicode 字符,并给出名称“普通”大小写?

Perl 包括按名称查找代码点的 unicode 文字,如下所示:

"\N{GREEK SMALL LETTER ALPHA}"

我发现类似下面的内容更容易阅读:

  "\N{Greek Small Letter Alpha}",

据我所知,在 unicode 字符名称方面没有最小对。如果字符不存在,是否有一种简洁的方法来命名仍然在执行脚本的过程中很早就触发编译错误的字符?

故意拼错字符名称的编译错误示例,这是我不想放弃的一种检查。

$ echo '%[a]' | ./unicodify 
Unknown charname 'GREK SMALL LETTER ALPHA' at ./unicodify line 10, within string

Execution of ./unicodify aborted due to compilation errors.

我正在尝试编写一个小实用程序,以便更容易地通过由 %[] 分隔的助记名称在文本文件中输入 unicode 字符。

这是一个非常精简的例子,它只是替换了 %[a]%[b]

#! /usr/bin/env perl

use strict;
use warnings;

use utf8;
use open ':std' => ':utf8';

my %abbrevs = (
'a' => "\N{GREEK SMALL LETTER ALPHA}",
'b' => "\N{GREEK SMALL LETTER BETA}",
);

while (<>) {
chomp;
my $line = $_;
$line =~ s/(\%\[(.*?)\])/$abbrevs{$2}/g;
print "${line}\n";
}

最佳答案

报价 charnames ,

Starting in Perl v5.16, any occurrence of \N{CHARNAME} sequences in a double-quotish string automatically loads this module with arguments :full and :short (described below) if it hasn't already been loaded with different arguments

其中一个“不同的参数”请求使用松散匹配。

$ perl -CSD -e'
use charnames ":loose";
CORE::say "\N{Greek Small Letter Alpha}";
'
α

LOOSE MATCHES

By specifying :loose, Unicode's loose character name matching rules are selected instead of the strict exact match used otherwise. That means that CHARNAME doesn't have to be so precisely specified. Upper/lower case doesn't matter (except with scripts as mentioned above), nor do any underscores, and the only hyphens that matter are those at the beginning or end of a word in the name (with one exception: the hyphen in U+1180 HANGUL JUNGSEONG O-E does matter). Also, blanks not adjacent to hyphens don't matter. The official Unicode names are quite variable as to where they use hyphens versus spaces to separate word-like units, and this option allows you to not have to care as much. The reason non-medial hyphens matter is because of cases like U+0F60 TIBETAN LETTER -A versus U+0F68 TIBETAN LETTER A. The hyphen here is significant, as is the space before it, and so both must be included.

:loose slows down look-ups by a factor of 2 to 3 versus :full, but the trade-off may be worth it to you. Each individual look-up takes very little time, and the results are cached, so the speed difference would become a factor only in programs that do look-ups of many different spellings, and probably only when those look-ups are through vianame() and string_vianame(), since \N{...} look-ups are done at compile time.

该模块还提供了创建自定义别名的方法。

关于perl 按名称指定 unicode 字符而不将名称全部大写,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53910434/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com