gpt4 book ai didi

objective-c - 如何在 iOS 上转换为 "combining diacritical marks"

转载 作者:搜寻专家 更新时间:2023-10-30 19:47:26 25 4
gpt4 key购买 nike

在我的应用程序中,我有一些字符后跟它们的“修饰变音符号”(例如“o^”,其中“^”是 unicode 0x02c6),我想将其转换为完全预组合的字符(例如“ô”-统一码 0x00f4)。我尝试使用 NSString 方法 precomposedStringWithCanonicalMapping,但在我用头撞墙试图找出它不起作用的原因几个小时后,我发现它只能将“组合变音标记”(http://www.unicode.org/charts/PDF/U0300.pdf)转换为预组合字符。好的,所以我需要做的就是将我所有的“修饰符变音符”转换为“组合变音符”,然后对生成的字符串执行预组合的 StringWithCanonicalMapping,我就完成了。这确实有效,但我想知道是否有更简单/更容易出错的方法来做到这一点?这是我的 NSString 类别方法,它似乎可以修复大多数字符-

- (instancetype)combineDiacritics
{
static NSDictionary<NSNumber *, NSNumber *> *sDiacriticalSubstDict; //unichar of diacritic -> unichar of combining diacritic
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
//http://www.unicode.org/charts/PDF/U0300.pdf
sDiacriticalSubstDict = @{ @(0x02cb) : @(0x0300), @(0x00b4) : @(0x0301), @(0x02c6) : @(0x0302), @(0x02dc) : @(0x0303), @(0x02c9) : @(0x0304), //Grave, Acute, Circumflex, Tilde, Macron
@(0x00af) : @(0x0305), @(0x02d8) : @(0x0306), @(0x02d9) : @(0x0307), @(0x00a8) : @(0x0308), @(0x02c0) : @(0x0309), //Overline, Breve, Dot above, Diaeresis
@(0x00b0) : @(0x030a), @(0x02da) : @(0x030b), @(0x02c7) : @(0x030c), @(0x02c8) : @(0x030d), @(0x02bb) : @(0x0312), //Ring above, Double Acute, Caron, Vertical line above, Cedilla above
@(0x02bc) : @(0x0313), @(0x02bd) : @(0x0314), @(0x02b2) : @(0x0321), @(0x02d4) : @(0x0323), @(0x02b1) : @(0x0324), //Comma above, Reversed comma above, Palatalized hook below, Dot below, Diaeresis below
@(0x00b8) : @(0x0327), @(0x02db) : @(0x0328), @(0x02cc) : @(0x0329), @(0x02b7) : @(0x032b), @(0x02cd) : @(0x0331), //Cedilla, Ogonek, Vert line below, Inverted double arch below, Macron below
};
});
NSMutableString* __block buffer = [NSMutableString stringWithCapacity:self.length];
[self enumerateSubstringsInRange:NSMakeRange(0, self.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock: ^(NSString* substring, NSRange substringRange, NSRange enclosingRange, BOOL* stop) {
NSString *newString = nil;
if (substring.length == 1) //The diacriticals are all Unicode BMP.
{
unichar uniChar = [substring characterAtIndex:0];
unichar newUniChar = [sDiacriticalSubstDict[@(uniChar)] integerValue];
if (newUniChar != 0)
{
NSLog(@"Unichar %04x => %04x", uniChar, newUniChar);
newString = [NSString stringWithCharacters:&newUniChar length:1];
}
}
if (newString)
[buffer appendString:newString];
else
[buffer appendString:substring];
}];

NSString *precomposedStr = [buffer precomposedStringWithCanonicalMapping];
return precomposedStr;
}

有人知道进行这种转换的更多内置方法吗?

最佳答案

没有内置的方法来进行这种转换,因为间距修饰符字母 block (U+02B0..U+02FF) 中的字符不打算用作变音符号。来自 Unicode 标准的第 7.8 节:

They are not formally combining marks (gc=Mn or gc=Mc) and do not graphically combine with the base letter that they modify. They are base characters in their own right.

Spacing Clones of Diacritics. Some corporate standards explicitly specify spacing and nonspacing forms of combining diacritical marks, and the Unicode Standard provides matching codes for these interpretations when practical.

如果您想将它们转换为组合形式,您将需要从 Spacing Modifier Letters code chart 中的交叉引用构建一个表(正如您已经在做的那样) .

关于objective-c - 如何在 iOS 上转换为 "combining diacritical marks",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35952216/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com