gpt4 book ai didi

java - C# 等效于 Java 的 BreakIterator

转载 作者:行者123 更新时间:2023-12-01 13:39:25 29 4
gpt4 key购买 nike

我正在做一个从 java 到 c# 的转换项目,是否有任何与 BreakIterator 等效的 c# ?我在尝试 IEnumerator ,但找不到 iterator.SetText()下面的用法,任何人都可以为以下几行建议等效的 C# 代码:

String finalResult=""
ArrayList<String> resultList = new ArrayList<String>();
BreakIterator iterator = BreakIterator.getSentenceInstance(currentLocale);
//int counter = 0;
iterator.setText(finalResult);
int lastIndex = iterator.first();
while (lastIndex != BreakIterator.DONE)
{
int firstIndex = lastIndex;
lastIndex = iterator.next();
if (lastIndex != BreakIterator.DONE)
{
String sentence = finalResult.substring(firstIndex, lastIndex);
resultList.add(sentence);
System.out.println("sentence = " + sentence);
//counter++;
}
}

最佳答案

BreakIterator是一种支持对任意 Unicode 文本字符串进行区域感知边界分析的机制。我怀疑 Java 类严重基于(甚至可能直接依赖于,但我推测)ICU(Unicode 国际组件)项目:http://site.icu-project.org/

引用 ICU docs :

Text boundary analysis is the process of locating linguistic boundaries while formatting and handling text. Examples of this process include:

  1. Locating appropriate points to word-wrap text to fit within specific margins while displaying or printing.
  2. Locating the beginning of a word that the user has selected.
  3. Counting characters, words, sentences, or paragraphs.
  4. Determining how far to move the text cursor when the user hits an arrow key (Some characters require more than one position in the text store and some characters in the text store do not display at all).
  5. Making a list of the unique words in a document.
  6. Figuring out if a given range of text contains only whole words.
  7. Capitalizing the first letter of each word.
  8. Locating a particular unit of the text (For example, finding the third word in the document).


ICU 提供 C 语言绑定(bind),恰本地命名为 ICU4C。 ICU FAQ ICU4C 描述:

The C and C++ languages and many operating system environments do not provide full support for Unicode and standards-compliant text handling services. Even though some platforms do provide good Unicode text handling services, portable application code can not make use of them. The ICU4C libraries fills in this gap. ICU4C provides an open, flexible, portable foundation for applications to use for their software globalization requirements. ICU4C closely tracks industry standards, including Unicode and CLDR (Common Locale Data Repository).



SIL International 提供 C# 语言绑定(bind),允许您通过名为 icu-dotnet 的项目在 C# 应用程序中使用 ICU4C。

您可以在 Github 上找到官方 icu-dotnet 存储库:
https://github.com/sillsdev/icu-dotnet

或者,通过 Nuget 安装它:
https://www.nuget.org/packages/icu.net/

关于java - C# 等效于 Java 的 BreakIterator,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44244081/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com