gpt4 book ai didi

java - Java 中的嘈杂字符串匹配?

转载 作者:行者123 更新时间:2023-11-30 08:24:14 25 4
gpt4 key购买 nike

考虑以下字符串:

Arg = "north_carolina_state_university"

Text = "哈克尼在转到北卡罗来纳大学教堂山分校之前就读于北卡罗来纳州立大学,在那里他获得了学士和法学博士学位。他在 1971-74 年间担任检察官,然后开始私有(private)执业。1974 年,他是众议员艾克·安德鲁斯的竞选经理。在北卡罗来纳大学教堂山分校读本科期间,他撰写了关于北卡罗来纳州惩教系统历史的荣誉论文。”

我知道可以在文本中找到 Arg 的变体,但不一定相同,而且 Arg 可能有噪音。

另一个例子如下:

Arg2 = "maurice_blackburn"

Text2 = "Maurice McCrae Blackburn(1880 年 11 月 19 日 - 1944 年 3 月 31 日),澳大利亚政治家和律师,出生于维多利亚州英格伍德。1887 年父亲去世后,他随母亲移居墨尔本。 1896 年毕业于墨尔本文法学校。完成学业后,他进入墨尔本大学,1909 年毕业于艺术和法律专业,一年后开始从事律师工作。”

在上面的示例中,Arg2 中的中间名未在 Text2 中使用。

Arg3 = "kansas_city_metropolitan_area"Text3 = "罗奇被选为共和党人参加了第六十七届和第六十八届国会(1921年3月4日-1925年3月3日)。他担任司法部支出委员会主席(第六十八届国会). 他在 1924 年竞选连任第六十九届国会时落选。他于 1924 年 12 月 27 日移居密苏里州圣路易斯,并恢复从事法律工作。他于 6 月 29 日在密苏里州堪萨斯城去世, 1934 年,他被安葬在密苏里州罗奇附近的罗奇公墓。

在此示例中,“堪萨斯城”出现在 Text3 中,但没有“大都会区”(因为它出现在 Arg3 中)。

有没有发现文本中出现 Arg 的函数/库?

最佳答案

我希望这个答案至少可以帮助您获得一些想法。我创建了一个方法来回答这个问题

Any function/library to discover the occurrence of the Arg in the text?

这是我使用上面的示例从我的方法中收到的以下输出:

Arg = "north_carolina_state_university"

Text = "Hackney attended North Carolina State University before transferring to the University of North Carolina at Chapel Hill, where he earned bachelor's and Juris Doctor degrees. He worked as a prosecutor from 1971-74 before going into private practice. In 1974, he was campaign manager for Congressman Ike Andrews. While an undergraduate at UNC-Chapel Hill, he wrote his Honors Thesis on the history of the North Carolina corrections system."

Output

Match Results

Words:4/4

Letters:28/28


Arg2 = "maurice_blackburn"

Text2 = "Maurice McCrae Blackburn (19 November 1880 -- 31 March 1944), Australian politician and lawyer, was born in Inglewood, Victoria. He moved to Melbourne with his mother following the death of his father in 1887. He was educated at Melbourne Grammar School matriculating in 1896. After completing school, he attended the University of Melbourne, graduating in arts and law in 1909, and began to practice as a lawyer a year later."

Output

Match Results

Words:2/2

Letters:16/16


Arg3 = "kansas_city_metropolitan_area"

Text3 = "Roach was elected as a Republican to the Sixty-seventh and Sixty-eighth Congresses (March 4, 1921-March 3, 1925). He served as chairman of the Committee on Expenditures in the Department of Justice (Sixty-eighth Congress). He was an unsuccessful candidate for reelection in 1924 to the Sixty-ninth Congress. He moved to St. Louis, Missouri, December 27, 1924, and resumed the practice of law. He died at Kansas City, Missouri, June 29, 1934. He was interred in Roach Cemetery near Roach, Missouri".

Output

Match Results

Words:2/4

Letters:13/26

该方法只搜索英文字母表,只搜索单词(空格分隔),也不搜索乱序的单词字母。如果您搜索 cat 并且有人键入 acat,它将显示为不匹配,也不会显示为任何字母匹配。这是有意的,因为狗不是热狗。你真的必须决定你希望你的比赛有多模糊。这段代码绝不是最好的,但我希望它能给你一些想法,也许可以重写它,使其更加整洁有序。无论哪种方式,它都会回答您提出的确切问题。

public static String search(String search, String target) {
String result = "";
search = search.toLowerCase();
target = target.toLowerCase();
StringBuilder temp = new StringBuilder();
ArrayList<String> searchWords = new ArrayList<String>();
ArrayList<String> targetWords = new ArrayList<String>();
char lastChar = ' ';
char currentChar = ' ';
// search,text
int swords, twords, sletters, tletters, mwords, mletters;
swords = twords = sletters = tletters = mwords = mletters = 0;

for (Character c : search.toCharArray()) {
currentChar = c > 96 && c < 123 ? c : ' ';
if (lastChar == ' ' && currentChar == ' ')
continue;
if (currentChar != ' ' && ++sletters != 0)
temp.append(currentChar);
else {
searchWords.add(temp.toString());
temp.setLength(0);
}
lastChar = currentChar;
}
searchWords.add(temp.toString());
temp.setLength(0);
lastChar = ' ';
for (Character c : target.toCharArray()) {
currentChar = c > 96 && c < 123 ? c : ' ';
if (lastChar == ' ' && currentChar == ' ')
continue;
if (currentChar != ' ' && ++tletters != 0)
temp.append(currentChar);
else {
targetWords.add(temp.toString());
temp.setLength(0);
}
lastChar = currentChar;
}
targetWords.add(temp.toString());
temp.setLength(0);
search = searchWords.toString();
target = targetWords.toString();
swords = searchWords.size();
twords = targetWords.size();
int[] blm = new int[searchWords.size()]; // best letter match
int lm = 0;// letter match
for (int i = 0; i < searchWords.size(); i++) {
for (String t : targetWords) {
for (int i2 = 0; i2 < (searchWords.get(i).length() < t
.length() ? searchWords.get(i).length() : t
.length()); i2++) {
if (t.charAt(i2) == searchWords.get(i).charAt(i2))
lm++;
}
if (blm[i] < lm)
blm[i] = lm;
lm = 0;
}
}

for (int i = 0; i < blm.length; i++) {
if (blm[i] == searchWords.get(i).length())
mwords++;
mletters += blm[i];
}

result = MessageFormat
.format("-----\nSearch text:\"{0}\"\nWords:{1}\nLetters:{2}\n-----\nTarget text:\"{3}\"\nWords:{4}\nLetters:{5}\n-----\nMatch Results\nWords:{6}/{1}\nLetters:{7}/{2}",
search, swords, sletters, target, twords, tletters,
mwords, mletters);
return result;
}

关于java - Java 中的嘈杂字符串匹配?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23095751/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com