gpt4 book ai didi

java - 将段落分解为字符串标记

转载 作者:塔克拉玛干 更新时间:2023-11-03 03:43:15 25 4
gpt4 key购买 nike

我能够根据给定的第 n 个字符限制将文本段落分解为子字符串。我遇到的冲突是我的算法正是这样做的,并且正在分解单词。这就是我被困的地方。如果字符限制出现在单词的中间,我该如何回溯到一个空格,以便我的所有子字符串都有完整的单词?

这是我正在使用的算法

int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));

String[] result = new String[arrayLength];
int j = 0;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
result[i] = mText.substring(j, j + charLimit);
j += charLimit;
}

result[lastIndex] = mText.substring(j);

我将 charLimit 变量设置为任意第 n 个整数值。而 mText 是带有一段文字的字符串。关于如何改进这个的任何建议?提前谢谢你。

我收到了很好的回应,所以你知道我做了什么来弄清楚我是否降落在一个空间上,我使用了这个 while 循环。我只是不知道如何从这一点开始纠正。

while (!strTemp.substring(strTemp.length() - 1).equalsIgnoreCase(" ")) {
// somehow refine string before added to array
}

最佳答案

不确定我是否理解正确,但我的解释是一个答案:

您可以使用 lastIndexOf 找到字符限制之前的最后一个空格然后检查你是否足够接近你的限制(对于没有空格的文本)即:

int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));

String[] result = new String[arrayLength];
int j = 0;
int tolerance = 10;
int splitpoint;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
splitpoint = mText.lastIndexOf(' ' ,j+charLimit);
splitpoint = splitpoint > j+charLimit-tolerance ? splitpoint:j+charLimit;
result[i] = mText.substring(j, splitpoint).trim();
j = splitpoint;
}

result[lastIndex] = mText.substring(j).trim();

这将搜索 charLimit 之前的最后一个空格(示例值),如果字符串小于 tolerance 则将其拆分,或者在 charLimit 处拆分 如果不是。

此解决方案的唯一问题是最后一个 Stringtoken 可能比 charLimit 长,因此您可能需要调整 arrayLength 并循环 while (mText - j >字符限制)


编辑

运行示例代码:

 public static void main(String[] args) {
String mText = "I am able to break up paragraphs of text into substrings based upon nth given character limit. The conflict I have is that my algorithm is doing exactly this, and is breaking up words. This is where I am stuck. If the character limit occurs in the middle of a word, how can I back track to a space so that all my substrings have entire words?";

int charLimit = 40;
int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));

String[] result = new String[arrayLength];
int j = 0;
int tolerance = 10;
int splitpoint;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
splitpoint = mText.lastIndexOf(' ' ,j+charLimit);
splitpoint = splitpoint > j+charLimit-tolerance ? splitpoint:j+charLimit;
result[i] = mText.substring(j, splitpoint);
j = splitpoint;
}

result[lastIndex] = mText.substring(j);

for (int i = 0; i<arrayLength; i++) {
System.out.println(result[i]);
}
}

输出:

I am able to break up paragraphs of text
into substrings based upon nth given
character limit. The conflict I have is
that my algorithm is doing exactly
this, and is breaking up words. This is
where I am stuck. If the character
limit occurs in the middle of a word,
how can I back track to a space so that
all my substrings have entire words?

附加编辑:根据 curiosu 的建议添加了 trim()。它删除了字符串标记周围的空格。

关于java - 将段落分解为字符串标记,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25411319/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com