gpt4 book ai didi

java - 如何在不知道实际模式的情况下检查字符串中的重复模式?

转载 作者:行者123 更新时间:2023-11-30 08:31:39 25 4
gpt4 key购买 nike

例如,我有一个字符串“fbrtfuifigfbrt”。我想查找一个字符序列是否在字符串中重复出现,但我不知道该字符序列是什么。在这种情况下,它是 fbrt

我考虑过将字符串分解成一堆单独的单词,然后检查这些单词是否相同,但在解析较长的字符串时,这很快就会变得低效。

目前,我实现了上述想法,但肯定还有更好的想法。

String s = "fbrtfuifigfbrt";
ArrayList<String> words = new ArrayList<String>(s.length() * s.length());

for(int outerLoop = 0; outerLoop <= s.length(); outerLoop++){
for(int nestedLoop = 0; nestedLoop <= s.length(); nestedLoop++){
words.add(fileContents.substring(outerLoop, nestedLoop));
}
}
//I could dump the ArrayList in a HashSet and check if they are the same size,
//then find those elements, etc.
//but that goes along with the above code, and I would prefer to use a more efficient method

最佳答案

Java 中的工作解决方案:

import java.util.ArrayList;
import java.util.List;

public class Main {
public static void main(String[] args) {
String test1 = "fbrtfuifigfbrt";
String test2 = "abcdabcd";
String test3 = "fbrtxibrjkfbrt";
System.out.println(findRepetitions(test1));
System.out.println(findRepetitions(test2));
System.out.println(findRepetitions(test3));
}

private static List<String> findRepetitions(String string) {
List<String> patternsList = new ArrayList<>();
int length = string.length();
for (int i = 0; i < length; i++) { // search the first half
int limit = (length - i) / 2; // candidates can't be longer than half the remaining length
for (int j = 1; j <= limit; j++) {
int candidateEndIndex = i + j;
String candidate = string.substring(i, candidateEndIndex);
if (string.substring(candidateEndIndex).contains(candidate)) {
patternsList.add(candidate);
}
}
}
return patternsList;
}
}

输出:

[f, fb, fbr, fbrt, b, br, brt, r, rt, t, f, i, f]
[a, ab, abc, abcd, b, bc, bcd, c, cd, d]
[f, fb, fbr, fbrt, b, br, brt, r, rt, t, b, br, r]

正如其他人所说,如果您不知道模式的长度或任何其他适用的限制,就没有简单的优化。

如果您想天真地丢弃像ffbfbr这样的子模式,它们被计算只是因为它们是最长的 fbrt 模式的子串,你可以让内部的 for 向下计数,从 limit 向下到 1,这样你会发现更长模式,然后在将它们添加到列表之前检查下一个模式是否是已找到模式的子字符串。像这样:

import java.util.ArrayList;
import java.util.List;

public class Main {
public static void main(String[] args) {
String test1 = "fbrtfuifigfbrt";
String test2 = "abcdabcd";
String test3 = "fbrtxibrjkfbrt"; // "br" is a pattern but this version won't find it
System.out.println(findRepetitions(test1));
System.out.println(findRepetitions(test2));
System.out.println(findRepetitions(test3));
}

private static List<String> findRepetitions(String string) {
List<String> patternsList = new ArrayList<>();
int length = string.length();
for (int i = 0; i < length; i++) { // search the first half
int limit = (length - i) / 2; // candidates can't be longer than half the remaining length
for (int j = limit; j >= 1; j--) {
int candidateEndIndex = i + j;
String candidate = string.substring(i, candidateEndIndex);
if (string.substring(candidateEndIndex).contains(candidate)) {
boolean notASubpattern = true;
for (String pattern : patternsList) {
if (pattern.contains(candidate)) {
notASubpattern = false;
break;
}
}
if (notASubpattern) {
patternsList.add(candidate);
}
}
}
}
return patternsList;
}
}

但是,这会阻止您在 fbrtxzbrjkfbrt 中找到 br,如输出所示(并且对于具有大量也有不同的模式):

[fbrt, i]
[abcd]
[fbrt]

因此是天真部分。当然,您可以包含更多内部循环,以确保在实际丢弃它们之前,不会在原始字符串中“自行”找到要丢弃的候选对象......等等。这取决于您想要搜索的程度成为。

关于java - 如何在不知道实际模式的情况下检查字符串中的重复模式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40539340/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com