gpt4 book ai didi

java - 不拆分单词的最长公共(public)子串

转载 作者:塔克拉玛干 更新时间:2023-11-02 19:18:57 24 4
gpt4 key购买 nike

我正在尝试实现最长公共(public)子串的修改版本,我不想拆分单词。

例如:

String s1 = "hi there how are you";
String s2 = "hi there how ar";

所以输出应该是:o/p:你好

我首先计算最长公共(public)子串,然后过滤所有拆分的单词,下面是相同的代码:

private static void longestCommonSubstring(String S1, String S2) {
int Start = 0;
int Max = 0;
for (int i = 0; i < S1.length(); i++) {
for (int j = 0; j < S2.length(); j++) {
int x = 0;
while (S1.charAt(i + x) == S2.charAt(j + x)) {
x++;
if (((i + x) >= S1.length()) || ((j + x) >= S2.length()))
break;
}
if (x > Max) {
Max = x;
Start = i;
}
}
}
System.out.println("S1 " + S1 + " S2 " + S2 + " " + Start + " " + Max);
System.out.println("ans " + S1.substring(Start, (Start+Max)));


if(Start != 0){
if((S1.charAt(Start-1) == ' ')){
if(Start+Max != S1.length()){
if((S1.charAt(Start+Max) == ' ')){
System.out.println(S1.substring(Start, (Start + Max)));
}
}
}
else if((S1.charAt(Start+Max) == ' ')){
int index = S1.indexOf(' ', Start);
System.out.println(S1.substring(index+1, (Start + Max)));
}
else{
int index = S1.indexOf(' ', Start);
int lastIndex = S1.lastIndexOf(' ', (Start+Max));
if(index+1 != lastIndex+1){
System.out.println(S1.substring(index+1, lastIndex));
}
else{
System.out.println(" ");
}
}
}
else if(Start == 0 && Start+Max != S1.length() && (S1.charAt(Start+Max) == ' ')){
System.out.println(S1.substring(Start, (Start + Max)));
}
else if(Start+Max != S1.length()){
int index = S1.lastIndexOf(' ', (Start+Max));
System.out.println(S1.substring(Start, index));
}
else{
System.out.println(S1.substring(Start, (Start + Max)));
}


}

但在以下情况下它会失败:

String s1 = "hi there how are you";
String s2 = "i ere ho ar you";

输出应该是“you”但是是空白的,因为最长的公共(public)子串是“ere ho”。

感谢您的帮助。

最佳答案

基于 karthik,在 bna 评论说任何基于字符的答案都有缺陷之后:

public static ArrayList<String> stringToTokens(String s) {
ArrayList<String> al = new ArrayList<String>();
StringBuilder word = new StringBuilder();
boolean inWord = !s.isEmpty() && Character.isLetter(s.charAt(0));
for (int i=0; i<s.length(); i++) {
char c = s.charAt(i);
if (Character.isLetter(c)) {
word.append(c);
} else if (inWord) {
if (word.length() > 0) {
al.add(word.toString());
word.setLength(0);
}
al.add(""+c);
} else {
al.add(""+c);
}
}
if (word.length() > 0) {
al.add(word.toString());
}
return al;
}

public static String longestCommonWithWords(String sa, String sb) {
ArrayList<String> a = stringToTokens(sa);
ArrayList<String> b = stringToTokens(sb);
int m[][] = new int[a.size()][b.size()];
int bestLength = 0;
HashSet<Integer> bestSet = new HashSet<Integer>();
for (int i=0; i<a.size(); i++) {
for (int j=0; j<b.size(); j++) {
if (a.get(i).equals(b.get(j))) {
m[i][j] = (i==0 || j==0) ? 1 : m[i-1][j-1]+1;
if (m[i][j] > bestLength) {
bestLength = m[i][j];
bestSet.clear();
bestSet.add(i);
} else if (m[i][j] == bestLength) {
bestSet.add(i);
}
} else {
m[i][j] = 0;
}
}
}
// return first result (or empty string if none)
StringBuilder result = new StringBuilder();
if (bestLength > 0) {
int end = bestSet.iterator().next();
int start = end - bestLength;
for (int i=start; i<end; i++) {
result.append(a.get(i+1));
}
}
return result.toString();
}

标记化很简单(StringTokenizer 也可以,但这个版本可以更好地处理奇怪的字符)。 LCS 算法直接来自 https://en.wikipedia.org/wiki/Longest_common_substring_problem#Pseudocode 中的伪代码

关于java - 不拆分单词的最长公共(public)子串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31272307/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com