gpt4 book ai didi

c# - 了解 Boyer Moor

转载 作者:塔克拉玛干 更新时间:2023-11-03 06:29:08 26 4
gpt4 key购买 nike

我正在尝试在大量文本中实现精确的文本搜索。为此,我找到了一些针对 c# 的 Boyer Moore 实现示例,但现在我无法理解它是如何工作的。

例如,如果我有字符串 this is sample text to search for 并且想要 to search for它有效,但如果我将我的搜索模式更改为搜索和文本,它仍然返回值而不是-1。为什么会这样?在我的搜索文本中没有要搜索的带模式的字符串和文本。

下面是我通过 Stackoverflow 找到的一个实现

public class BoyerMooreStringSearching
{
readonly Dictionary<char, LastIndexTable> _lastIndexTable = new Dictionary<char, LastIndexTable>();
public string PatternToSearch;

public List<int> GetStartingIndexsOfPatternInText(string textToSearchIn, string patternToSearch)
{
var list = new List<int>();
PatternToSearch = patternToSearch;
if (patternToSearch != null && !string.IsNullOrEmpty(textToSearchIn))
{
UpdateLastIndexTable(patternToSearch);
PatternToSearch = patternToSearch;

var j = patternToSearch.Length - 1;

// Main loop to iterate over whole text
while (j <= textToSearchIn.Length - 1)
{
var lastCharOfPattern = patternToSearch[patternToSearch.Length - 1];

if (textToSearchIn[j] != lastCharOfPattern)
{
// Heuristic 1
// If Last Char is not matched with the Last char in pattern and char is not present in the pattern
// Then advance pointer 'j' to the length of the pattern in textToSearch.
if (!_lastIndexTable.ContainsKey(textToSearchIn[j]))
{
j += patternToSearch.Length - 1;
}

// Heuristic 2
// Consult the lastIndex table to get the last index of current char in textToSearch
// and advance pointer 'j' to the last index in textToSearch.
if (j <= textToSearchIn.Length - 1 && _lastIndexTable.ContainsKey(textToSearchIn[j]))
{
var tempObj = _lastIndexTable[textToSearchIn[j]];
if (tempObj != null) j += tempObj.LastIndex;
}
}

int k = patternToSearch.Length - 1;
int u = j;
if (j <= textToSearchIn.Length - 1)
{
while (k >= 0)
{
// Heuristic (3a)
// If Last Char is matched with the Last char in pattern then back track in the text and pattern till
// either you got a complete match or a mismatched charecter.
// Once you got the mismatched char and mismatched char is not present in the pattern then
// advance j to the index of mismatched charecter in the pattern
if (textToSearchIn[u] == patternToSearch[k])
{
if (k == 0 && textToSearchIn[u] == patternToSearch[k])
{
list.Add(u);
j += patternToSearch.Length - 1;
}
u--;
k--;
continue;
}

if (!_lastIndexTable.ContainsKey(textToSearchIn[u]))
{
// Heuristic (3b)
// If Last Char is matched with the Last char in pattern then back track in the text till
// either you got a complete match or a mismatched charecter.
// Once you got the mismatched char and mismatched char is not present in the pattern then
// advance j to the index of mismatched charecter in the pattern plus the number to char which matched.
j += k + (j - u);
break;
}

k--;
}
}

j++;
}

}
if (!list.Any())
list.Add(-1);

return list;
}

private void UpdateLastIndexTable(string patternToSearch)
{
_lastIndexTable.Clear();
var i = patternToSearch.Length - 1;
foreach (var charToSeach in patternToSearch)
{
if (_lastIndexTable.ContainsKey(charToSeach))
{
_lastIndexTable[charToSeach].LastIndex = i;
}
else
{
_lastIndexTable.Add(charToSeach, new LastIndexTable
{
CharSearched = charToSeach,
LastIndex = i
});
}
i--;
}
}

private class LastIndexTable
{
public char CharSearched { get; set; }

public int LastIndex { get; set; }
}
}

这是它的示例用法。

var each = "this is sample text to search for";
var result = new BoyerMooreStringSearching().GetStartingIndexsOfPatternInText(each, "to search for and text");

最佳答案

当回溯(启发式 3a)时,您会一直在搜索字符串中查找字符,直到结束。您需要为此进行额外检查:

if (k == 0 && textToSearchIn[u] == patternToSearch[k])
{
if (u + patternToSearch.Length <= textToSearchIn.Length)
list.Add(u);
j += patternToSearch.Length - 1;
}

关于c# - 了解 Boyer Moor,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20097536/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com