gpt4 book ai didi

sas - SAS 中的 Jaro-Winkler 字符串比较函数

转载 作者:行者123 更新时间:2023-12-02 14:11:08 25 4
gpt4 key购买 nike

是否有 Jaro-Winkler 的实现SAS 中的字符串比较?

看起来像Link King有 Jaro-Winkler,但我更喜欢自己调用该函数的灵 active 。

谢谢!

最佳答案

据我所知,没有内置的 jaro-winkler 距离函数。 @Itzy 已经引用了我所知道的唯一的。如果您愿意的话,您可以使用 proc fcmp 来滚动您自己的函数。我什至会通过下面的代码为您提供一个良好的开端。我只是尝试按照维基百科上的文章进行操作。无论如何,它肯定不是 Bill Winkler 的 strcmp.c 文件的完美表示,并且可能有很多错误。

proc fcmp outlib=work.jaro.chars;

subroutine jaromatch ( string1 $ , string2 $ , matchChars $);
outargs matchChars;
/* Returns number of matched characters between 2 strings excluding blanks*/
/* two chars from string1 and string2 are considered matching
if they are no farther than floor(max(|s1|, |s2|)/2)-1 */

str1_len = length(strip(string1));
str2_len = length(strip(string2));

allowedDist = floor(max(str1_len, str2_len)/2) -1;

matchChars="";

/* walk through string 1 and match characters to string2 */
do i= 1 to str1_len;
x=substr(string1,i,1);
position = findc(string2,x ,max(1,i-allowedDist));
if position > 0 then do;
if position - i <= allowedDist then do;
y=substr(string2,position,1);
/* build list of matched characters */
matchChars=cats(matchChars,y);
end;
end;
end;
matchChars = strip(matchChars);
endsub;


function jarotrans (string1 $ , string2 $ );
ntrans = 0;
ubnd = min(length(strip(string1)), length(strip(string2)));
do i = 1 to ubnd;
if substr(string1,i,1) ne substr(string2,i,1) then do;
ntrans + 1;
end;
end;
return(ntrans/2);
endsub;

function getPrefixlen( string1 $ , string2 $, maxprelen);
/* get the length of the matching characters at the beginning */
n = min(maxprelen, length(string1), length(string2));
do i = 1 to n;
if substr(string1,i,1) ne substr(string2,i,1)
then return(max(1,i-1));
end;
endsub;

function jarodist(string1 $, string2 $);
/* get number of matched characters */
call jaromatch(string1, string2, m1);
m1_len = length(m1);
if m1_len = 0 then return(0);
call jaromatch(string2, string1, m2);
m2_len = length(m2);
if m2_len = 0 then return(0);

/* get number of transposed characters */
ntrans = jarotrans(m1, m2);
put m1_len= m2_len= ntrans= ;
j_dist = (m1_len/length(string1)
+ m2_len/length(string2)
+ (m1_len-ntrans)/m1_len ) / 3;
return(j_dist);
endsub;

function jarowink( string1 $, string2 $, prefixscale);
jarodist=jarodist(string1, string2);
prelen=getPrefixlen(string1, string2, 4);
if prelen = 0 then return(jarodist);
else return(jarodist + prelen * prefixscale * (1-jarodist));
endsub;

run;quit;

/* tell SAS where to find the functions we just wrote */
option cmplib=work.jaro;

/* Now let's try it out! */
data _null_;
string1='DIXON';
string2='DICKSONX';
x=jarodist(string1, string2);
y=jarowink(string1, string2, 0.1);
put x= y=;
run;

关于sas - SAS 中的 Jaro-Winkler 字符串比较函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6865019/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com