gpt4 book ai didi

mysql - vb.net + mysql - 搜索表中与输入值最相似的前 5 行

转载 作者:行者123 更新时间:2023-11-29 10:51:24 24 4
gpt4 key购买 nike

我有一个包含许多列的数据库,其中一列包含名称。

我的 vb.net 软件充当电报服务器并等待用户发送其全名。

数据库的名称可以有不同的拼写,例如“Marco Dell'Orso”可以拼写为“Marco Dellorso”或“Marco Dell Orso”或“Dell Orso Marco”或其他任何形式。用户还可能拼错他的名字并颠倒两个字母。例如“MaCRo Dell'Orso”

我需要一种方法来返回与查询中使用的单词最相似的 5 行。最好的方法是什么?我正在考虑将名称拆分为白色字符,然后在查询中对单个单词使用 LIKE,但这不适用于输入错误的单词。

编辑:

我当前的计划是,如果数据库包含多于或少于一行具有确切名称的行,则将输入拆分为单个单词并返回包含任何输入单词的所有字符串。这应该会将要分析的行数从 42000 行减少到几百行。一旦我有了这几百行,我就可以在这些行上运行 Levenshtein 函数并返回 5 个最匹配的..

这是个好主意吗?

最佳答案

通过将我的自定义函数与此链接中的预制 Levenshtein 函数相结合来解决此问题:How to calculate distance similarity measure of given 2 strings? 。我为另一个复合词中出现的每个单词分配一个分数。然后我根据每个单词与另一个单词的编辑比较添加分数。效果很好:

Public Class Form1

Private Sub TextBox1_KeyUp(sender As Object, e As KeyEventArgs) Handles TextBox1.KeyUp
calc()
End Sub

Private Sub TextBox2_KeyUp(sender As Object, e As KeyEventArgs) Handles TextBox2.KeyUp
calc()
End Sub


Sub calc()
Label1.Text = compare(TextBox1.Text, TextBox2.Text)
End Sub

Public Function compare(source As String, target As String) As Integer
Dim score As Double

Dim sourcewords As String() = source.Split(New Char() {" "c, "'"c, "`"c, "´"c})
Dim targetwords As String() = target.Split(New Char() {" "c, "'"c, "`"c, "´"c})

For Each s In sourcewords
If target.Contains(s) Then score = score + 1
For Each t In targetwords
score = score + 1 / (DamerauLevenshteinDistance(s, t, 100) + 1)
Next
Next

For Each s In targetwords
If source.Contains(s) Then score = score + 1
For Each t In sourcewords
score = score + 1 / (DamerauLevenshteinDistance(s, t, 100) + 1)
Next
Next



Return score
End Function

''' <summary>
''' Computes the Damerau-Levenshtein Distance between two strings, represented as arrays of
''' integers, where each integer represents the code point of a character in the source string.
''' Includes an optional threshhold which can be used to indicate the maximum allowable distance.
''' </summary>
''' <param name="source">An array of the code points of the first string</param>
''' <param name="target">An array of the code points of the second string</param>
''' <param name="threshold">Maximum allowable distance</param>
''' <returns>Int.MaxValue if threshhold exceeded; otherwise the Damerau-Leveshteim distance between the strings</returns>
Public Shared Function DamerauLevenshteinDistance(source As String, target As String, threshold As Integer) As Integer

Dim length1 As Integer = source.Length
Dim length2 As Integer = target.Length

' Return trivial case - difference in string lengths exceeds threshhold
If Math.Abs(length1 - length2) > threshold Then
Return Integer.MaxValue
End If

' Ensure arrays [i] / length1 use shorter length
If length1 > length2 Then
Swap(target, source)
Swap(length1, length2)
End If

Dim maxi As Integer = length1
Dim maxj As Integer = length2

Dim dCurrent As Integer() = New Integer(maxi) {}
Dim dMinus1 As Integer() = New Integer(maxi) {}
Dim dMinus2 As Integer() = New Integer(maxi) {}
Dim dSwap As Integer()

For i As Integer = 0 To maxi
dCurrent(i) = i
Next

Dim jm1 As Integer = 0, im1 As Integer = 0, im2 As Integer = -1

For j As Integer = 1 To maxj

' Rotate
dSwap = dMinus2
dMinus2 = dMinus1
dMinus1 = dCurrent
dCurrent = dSwap

' Initialize
Dim minDistance As Integer = Integer.MaxValue
dCurrent(0) = j
im1 = 0
im2 = -1

For i As Integer = 1 To maxi

Dim cost As Integer = If(source(im1) = target(jm1), 0, 1)

Dim del As Integer = dCurrent(im1) + 1
Dim ins As Integer = dMinus1(i) + 1
Dim [sub] As Integer = dMinus1(im1) + cost

'Fastest execution for min value of 3 integers
Dim min As Integer = If((del > ins), (If(ins > [sub], [sub], ins)), (If(del > [sub], [sub], del)))

If i > 1 AndAlso j > 1 AndAlso source(im2) = target(jm1) AndAlso source(im1) = target(j - 2) Then
min = Math.Min(min, dMinus2(im2) + cost)
End If

dCurrent(i) = min
If min < minDistance Then
minDistance = min
End If
im1 += 1
im2 += 1
Next
jm1 += 1
If minDistance > threshold Then
Return Integer.MaxValue - 1
End If
Next

Dim result As Integer = dCurrent(maxi)
Return If((result > threshold), Integer.MaxValue, result)
End Function

Private Shared Sub Swap(Of T)(ByRef arg1 As T, ByRef arg2 As T)
Dim temp As T = arg1
arg1 = arg2
arg2 = temp
End Sub

End Class

关于mysql - vb.net + mysql - 搜索表中与输入值最相似的前 5 行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43673316/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com