gpt4 book ai didi

java - 两次检查列表中的字符串是否包含来自不同列表的字符串的一部分,并在处理列表后返回唯一列表

转载 作者:行者123 更新时间:2023-12-02 10:51:09 25 4
gpt4 key购买 nike

processTest 的工作方式是,如果 list 包含 标题中的前三个单词或后三个单词或中间文本,它将从可修改列表。注意 count > 1L -> 列表需要多次包含相似的单词。我希望我的最终列表与包含 3 个元素的测试结果类似,但我得到的结果包含 4 个元素。在我的示例测试数据中:感谢解决问题或提高代码质量的任何帮助。

  1. not same words test 1 xyz not same words -> shouldn't be in list, matches first three words, removed

  2. not difference same words test 1 xyz not not same words -> shouldn't be in list, matches last three words, removed

  3. first threes words test 1 xyz not same words -> should be in list

  4. first three words test 2 xyz last three words -> shouldn't be in list, matches last three/first words

  5. first three words test 3 xyz last three words-> shouldn't be in list, matches last three/first words

  6. first three words Test 4 xyz last three words -> should be in list

  7. different words Test 5 xyz last different words -> should be in list

@Test
public void processDataTest() {
List<String> modifiableList = new ArrayList<>();
modifiableList.add("not same words test 1 xyz not same words");
modifiableList.add("not not same words test 1 xyz not not same words");
modifiableList.add("not same words test 1 xyz not same words");
modifiableList.add("first three words test 2 xyz last three words");
modifiableList.add("first three words test 3 xyz last three words");
modifiableList.add("first three words Test 4 xyz last three words");
modifiableList.add("different words Test 5 xyz last different words");

List<String> filteredList =
new ArrayList<>(modifiableList)
.stream()
.filter(StringUtils::isNotEmpty)
.filter(title -> !TextUtility.isThisUnicode(title, DEVANAGARI))
.filter(title -> !isStringDuplicateOrSimilar(modifiableList, title))
.collect(toList());
Assert.assertEquals(3, filteredList.size());
Assert.assertArrayEquals(
filteredList.toArray(),
new String[] {
"first threes words test 1 xyz not same words",
"first three words Test 4 xyz last three words",
"different words Test 5 xyz last different words"
});
}


private boolean isStringDuplicateOrSimilar(List<String> list, String title) {
String[] splitStr = title.split(StringUtils.SPACE);
String titleSubString = extractMiddleText(title);
System.out.println(titleSubString);
long count = list.stream().filter(containsSimilarWords(splitStr, titleSubString)).count();
System.out.println(count);
return list.removeIf(t -> t.equals(title) && count > 1L);
}

//检查标题是否包含中间文本或标题的前三个单词或标题的后三个单词

private static Predicate<String> containsSimilarWords(String[] splitStr, String titleSubString) {
return title ->
title.contains(titleSubString)
|| containsFirstThreeWords(title, splitStr)
|| containsLastThreeWords(title, splitStr);
}

public static boolean containsFirstThreeWords(String text, String[] words) {
return words.length > 5
&& text.contains(words[0])
&& text.contains(words[1])
&& text.contains(words[2]);
}

public static boolean containsLastThreeWords(String text, String[] words) {
int length = words.length;
return words.length > 5
&& text.contains(words[length - 1])
&& text.contains(words[length - 2])
&& text.contains(words[length - 3]);
}

public static String extractMiddleText(String text) {
int mid = text.length() / 2;
String[] parts = {text.substring(0, mid), text.substring(mid)};
int indexOfMidOfText2 = (parts[1].length() / 2) + parts[0].length();
return text.substring(mid / 2, indexOfMidOfText2);
}

最佳答案

一旦索引修复并使用更新的示例(代码中的文本尚未更新),我只获得 2 次通过,而示例 #6 被拒绝。这是因为 contains 的逻辑将示例 #6 中的单词 two 与示例 #3 中的文本 first Threes Words... 相匹配。您可以通过将文本 thirds 更改为 yam 来快速测试它。

如果这是不可取的,您可以使用带有单词边界的正则表达式,或者只是拆分它们并使用集合来查找匹配的单词。

关于java - 两次检查列表中的字符串是否包含来自不同列表的字符串的一部分,并在处理列表后返回唯一列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52171311/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com