gpt4 book ai didi

java - 优化 Jaro-Winkler 算法

转载 作者:搜寻专家 更新时间:2023-10-30 20:00:56 26 4
gpt4 key购买 nike

我有从 this 中获取的 Jaro-Winkler 算法代码网站。我需要运行 150,000 次才能获得差异之间的距离。这需要很长时间,因为我在 Android 移动设备上运行。

能不能再优化一下?

public class Jaro {
/**
* gets the similarity of the two strings using Jaro distance.
*
* @param string1 the first input string
* @param string2 the second input string
* @return a value between 0-1 of the similarity
*/
public float getSimilarity(final String string1, final String string2) {

//get half the length of the string rounded up - (this is the distance used for acceptable transpositions)
final int halflen = ((Math.min(string1.length(), string2.length())) / 2) + ((Math.min(string1.length(), string2.length())) % 2);

//get common characters
final StringBuffer common1 = getCommonCharacters(string1, string2, halflen);
final StringBuffer common2 = getCommonCharacters(string2, string1, halflen);

//check for zero in common
if (common1.length() == 0 || common2.length() == 0) {
return 0.0f;
}

//check for same length common strings returning 0.0f is not the same
if (common1.length() != common2.length()) {
return 0.0f;
}

//get the number of transpositions
int transpositions = 0;
int n=common1.length();
for (int i = 0; i < n; i++) {
if (common1.charAt(i) != common2.charAt(i))
transpositions++;
}
transpositions /= 2.0f;

//calculate jaro metric
return (common1.length() / ((float) string1.length()) +
common2.length() / ((float) string2.length()) +
(common1.length() - transpositions) / ((float) common1.length())) / 3.0f;
}

/**
* returns a string buffer of characters from string1 within string2 if they are of a given
* distance seperation from the position in string1.
*
* @param string1
* @param string2
* @param distanceSep
* @return a string buffer of characters from string1 within string2 if they are of a given
* distance seperation from the position in string1
*/
private static StringBuffer getCommonCharacters(final String string1, final String string2, final int distanceSep) {
//create a return buffer of characters
final StringBuffer returnCommons = new StringBuffer();
//create a copy of string2 for processing
final StringBuffer copy = new StringBuffer(string2);
//iterate over string1
int n=string1.length();
int m=string2.length();
for (int i = 0; i < n; i++) {
final char ch = string1.charAt(i);
//set boolean for quick loop exit if found
boolean foundIt = false;
//compare char with range of characters to either side

for (int j = Math.max(0, i - distanceSep); !foundIt && j < Math.min(i + distanceSep, m - 1); j++) {
//check if found
if (copy.charAt(j) == ch) {
foundIt = true;
//append character found
returnCommons.append(ch);
//alter copied string2 for processing
copy.setCharAt(j, (char)0);
}
}
}
return returnCommons;
}
}

我提到在整个过程中我只创建了脚本实例,所以只有一次

jaro= new Jaro();

如果您要测试并需要示例而不破坏脚本,您会找到它 here , 在另一个 python 优化线程中

最佳答案

是的,但您不会喜欢它。将所有那些 newed StringBuffers 替换为在构造函数中分配的 char 数组,再也不会使用整数索引来跟踪其中的内容。

This pending Commons-Lang patch会给你一些味道。

关于java - 优化 Jaro-Winkler 算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2848807/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com