gpt4 book ai didi

java - 如何在Java中转换重音字符

转载 作者:行者123 更新时间:2023-12-01 10:03:35 24 4
gpt4 key购买 nike

我使用的是 Java 1.5,我需要规范化字符串(像这样 àèìòù ---> aeiou )。我无法使用 Normalizer,因为它适用于 > 1.6有什么想法吗?

我已经尝试过这个:

public String normalizeText(String text) {
text = normalizer(text);
text = text.replaceAll("\\p{InCombiningDiacriticalMarks}]", "");
return text;
}

public static String normalizer(String word) {
try {
int i;
Class<?> normalizerClass = Class.forName("java.text.Normalizer");
Class<?> normalizerFormClass = null;
Class<?>[] nestedClasses = normalizerClass.getDeclaredClasses();
for (i = 0; i < nestedClasses.length; i++) {
Class<?> nestedClass = nestedClasses[i];
if (nestedClass.getName().equals("java.text.Normalizer$Form")) {
normalizerFormClass = nestedClass;
}
}
assert normalizerFormClass.isEnum();
Method methodNormalize = normalizerClass.getDeclaredMethod(
"normalize",
CharSequence.class,
normalizerFormClass);
Object nfcNormalization = null;
Object[] constants = normalizerFormClass.getEnumConstants();
for (i = 0; i < constants.length; i++) {
Object constant = constants[i];
if (constant.toString().equals("NFC")) {
nfcNormalization = constant;
}
}
return (String) methodNormalize.invoke(null, word, nfcNormalization);
} catch (Exception ex) { return null; }
}

最佳答案

制定自己的方法

如果您无法使用Normaliser,还有一个使用Map的好方法,您可以将所有可能的字母变体进行标准化。

HashMap<Character, Character> rep = new HashMap<>();
rep.put("à","a");
rep.put("è","e");
rep.put("ì","i");
rep.put("ò","o");
rep.put("ù","u");
// etc...

这又长又糟糕,所以从文本文件加载会更好。

<小时/>

已有答案

此时page我发现了以下answer 。有效,我已经测试过:

从 00c0 到 017f 的 unicode 表的镜像,不带变音符号。

private static final String tab00c0 = "AAAAAAACEEEEIIII" +
"DNOOOOO\u00d7\u00d8UUUUYI\u00df" +
"aaaaaaaceeeeiiii" +
"\u00f0nooooo\u00f7\u00f8uuuuy\u00fey" +
"AaAaAaCcCcCcCcDd" +
"DdEeEeEeEeEeGgGg" +
"GgGgHhHhIiIiIiIi" +
"IiJjJjKkkLlLlLlL" +
"lLlNnNnNnnNnOoOo" +
"OoOoRrRrRrSsSsSs" +
"SsTtTtTtUuUuUuUu" +
"UuUuWwYyYZzZzZzF";

返回不带变音符号的字符串 - 7 位近似值。

public static String removeDiacritic(String source) {
char[] vysl = new char[source.length()];
char one;
for (int i = 0; i < source.length(); i++) {
one = source.charAt(i);
if (one >= '\u00c0' && one <= '\u017f') {
one = tab00c0.charAt((int) one - '\u00c0');
}
vysl[i] = one;
}
return new String(vysl);
}

关于java - 如何在Java中转换重音字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36626579/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com