gpt4 book ai didi

java - 正则表达式 - 无法将西里尔字母与\w匹配

转载 作者:行者123 更新时间:2023-11-30 05:27:41 28 4
gpt4 key购买 nike

任务:

The task must be solved using regular expressions without using container classes.

Input: text (may consist of Latin and Cyrillic). Output: source text, but the case of the first character of each word, which consists of three or more characters, must be inverted.

To consider a word as a sequence containing only letters (all other characters are not included in the word). Create a static convert method that converts input to output.

Example Input data

When I was younger
I never needed
Прощай, со всех вокзалов поезда
уходят в Дальние Края

Example Output

when I Was Younger
I Never Needed
прощай, со Всех Вокзалов Поезда
Уходят в дальние края

我的尝试:

public static String convert(String input) {
StringBuilder sb = new StringBuilder(input);
Pattern p = Pattern.compile("[\\W&&[\\d]]?[\\w&&[\\D]]+");
Matcher m = p.matcher(input);
while (m.find()) {
if (m.group().length() >= 3) {
if (Character.isUpperCase(sb.charAt(m.start()))) {
sb.setCharAt(m.start(), Character.toLowerCase(sb.charAt(m.start())));
} else {
sb.setCharAt(m.start(), Character.toUpperCase(sb.charAt(m.start())));
}

}
}
return sb.toString();
}

我需要输出:

when I Was Younger
I Never Needed
прощай, со Всех Вокзалов Поезда
Уходят в дальние края

但我有:

when I Was Younger
I Never Needed
Прощай, со всех вокзалов поезда
уходят в Дальние Края

最佳答案

调试问题

\w 与西里尔字符不匹配。我通过在 while 循环中打印匹配的组来解决这个问题:

System.out.println(m.group());

打印:

When
I
was
younger
I
never
needed

没有其他单词匹配。

解决方案1

要匹配西里尔字符,您可以使用 \p{L}。如果您使用 {3} 匹配三个字符,则可以避免在循环中进行长度检查。 \b 匹配边界字符。把它们放在一起:

public static String convert(String input) {
StringBuilder sb = new StringBuilder(input);
Pattern p = Pattern.compile("\\b\\p{L}{3}");
Matcher m = p.matcher(input);
while (m.find()) {
char firstChar = sb.charAt(m.start());
if (Character.isUpperCase(firstChar)) {
sb.setCharAt(m.start(), Character.toLowerCase(firstChar));
} else {
sb.setCharAt(m.start(), Character.toUpperCase(firstChar));
}
}
return sb.toString();
}

产品:

when I Was Younger
I Never Needed
прощай, со Всех Вокзалов Поезда
Уходят в дальние края

解决方案2

或者,如果您想要真正漂亮,请使用正向前瞻(非捕获组)和采用 lambda 的匹配器 replaceAll 方法:

public static String convert(String input) {
Pattern p = Pattern.compile("\\b(\\p{L})(?=\\p{L}{2})");
Matcher m = p.matcher(input);
return m.replaceAll(match -> {
char ch = match.group().charAt(0);
if (Character.isUpperCase(ch)) {
return "" + Character.toLowerCase(ch);
}
return "" + Character.toUpperCase(ch);
});
}

还生产:

when I Was Younger
I Never Needed
прощай, со Всех Вокзалов Поезда
Уходят в дальние края

关于java - 正则表达式 - 无法将西里尔字母与\w匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58226125/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com