gpt4 book ai didi

string - 如何通过模糊匹配查找字符串中子字符串的位置

转载 作者:行者123 更新时间:2023-12-03 14:47:56 24 4
gpt4 key购买 nike

我遇到了匹配 OCR 识别文本中的字符串并找到它的位置的问题,考虑到可能存在对错误、丢失或额外字符的任意容忍。结果应该是最佳匹配位置,可能(不一定)具有匹配子字符串的长度。

例如:

String: 9912, 1.What is your name?
Substring: 1. What is your name?
Tolerance: 1
Result: match on character 7

String: Where is our caat if any?
Substring: your cat
Tolerance: 2
Result: match on character 10

String: Tolerance is t0o h1gh.
Substring: Tolerance is too high;
Tolerance: 1
Result: no match

我尝试过调整 Levenstein 算法,但它不能正确处理子字符串并且不返回位置。

Delphi 中的算法是首选,但任何实现或伪逻辑都可以。

最佳答案

这是一个有效的递归实现,但可能不够快。最坏的情况是找不到匹配项,并且“What”中除最后一个字符之外的所有字符在“Where”中的每个索引处都匹配。在这种情况下,算法将对 Where 中的每个字符进行 Length(What)-1 + Tolerance 比较,并对每个 Tolerance 进行一次递归调用。由于 Tolerance 和 What 的长度都是常量,所以我认为该算法是 O(n)。它的性能将随着“What”和“Where”的长度线性下降。

function BrouteFindFirst(What, Where:string; Tolerance:Integer; out AtIndex, OfLength:Integer):Boolean;
var i:Integer;
aLen:Integer;
WhatLen, WhereLen:Integer;

function BrouteCompare(wherePos, whatPos, Tolerance:Integer; out Len:Integer):Boolean;
var aLen:Integer;
aRecursiveLen:Integer;
begin
// Skip perfect match characters
aLen := 0;
while (whatPos <= WhatLen) and (wherePos <= WhereLen) and (What[whatPos] = Where[wherePos]) do
begin
Inc(aLen);
Inc(wherePos);
Inc(whatPos);
end;
// Did we find a match?
if (whatPos > WhatLen) then
begin
Result := True;
Len := aLen;
end
else if Tolerance = 0 then
Result := False // No match and no more "wild cards"
else
begin
// We'll make an recursive call to BrouteCompare, allowing for some tolerance in the string
// matching algorithm.
Dec(Tolerance); // use up one "wildcard"
Inc(whatPos); // consider the current char matched
if BrouteCompare(wherePos, whatPos, Tolerance, aRecursiveLen) then
begin
Len := aLen + aRecursiveLen;
Result := True;
end
else if BrouteCompare(wherePos + 1, whatPos, Tolerance, aRecursiveLen) then
begin
Len := aLen + aRecursiveLen;
Result := True;
end
else
Result := False; // no luck!
end;
end;

begin

WhatLen := Length(What);
WhereLen := Length(Where);

for i:=1 to Length(Where) do
begin
if BrouteCompare(i, 1, Tolerance, aLen) then
begin
AtIndex := i;
OfLength := aLen;
Result := True;
Exit;
end;
end;

// No match found!
Result := False;

end;

我使用以下代码来测试该功能:

procedure TForm18.Button1Click(Sender: TObject);
var AtIndex, OfLength:Integer;
begin
if BrouteFindFirst(Edit2.Text, Edit1.Text, ComboBox1.ItemIndex, AtIndex, OfLength) then
Label3.Caption := 'Found @' + IntToStr(AtIndex) + ', of length ' + IntToStr(OfLength)
else
Label3.Caption := 'Not found';
end;

对于案例:

String: Where is our caat if any?
Substring: your cat
Tolerance: 2
Result: match on character 10

它显示了字符 9 的匹配,长度为 6。对于其他两个示例,它给出了预期结果。

关于string - 如何通过模糊匹配查找字符串中子字符串的位置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4375603/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com