gpt4 book ai didi

python - 如何将发音相似的词放在一起

转载 作者:IT老高 更新时间:2023-10-28 20:52:55 25 4
gpt4 key购买 nike

我正在尝试从列表中获取所有发音相似的单词。

我尝试使用余弦相似度来获取它们,但这不能满足我的目的。

from sklearn.metrics.pairwise import cosine_similarity
dataList = ['two','fourth','forth','dessert','to','desert']
cosine_similarity(dataList)

我知道这不是正确的方法,我似乎无法得到如下结果:

result = ['xx', 'xx', 'yy', 'yy', 'zz', 'zz'] 

它们的意思是听起来相似的词

最佳答案

首先,您需要使用正确的方法来获取发音相似的单词,即字符串相似度,我建议:

使用 jellyfish :

from jellyfish import soundex

print(soundex("two"))
print(soundex("to"))

输出:

T000
T000

现在,也许,创建一个处理列表的函数,然后对其进行排序以获取它们:

def getSoundexList(dList):
res = [soundex(x) for x in dList] # iterate over each elem in the dataList
# print(res) # ['T000', 'F630', 'F630', 'D263', 'T000', 'D263']
return res

dataList = ['two','fourth','forth','dessert','to','desert']
print([x for x in sorted(getSoundexList(dataList))])

输出:

['D263', 'D263', 'F630', 'F630', 'T000', 'T000']

编辑:

另一种方式可能是:

使用 fuzzy :

import fuzzy
soundex = fuzzy.Soundex(4)

print(soundex("to"))
print(soundex("two"))

输出:

T000
T000

编辑 2:

如果你想让它们分组,你可以使用groupby:

from itertools import groupby

def getSoundexList(dList):
return sorted([soundex(x) for x in dList])

dataList = ['two','fourth','forth','dessert','to','desert']
print([list(g) for _, g in groupby(getSoundexList(dataList), lambda x: x)])

输出:

[['D263', 'D263'], ['F630', 'F630'], ['T000', 'T000']]

编辑 3:

这是@Eric Duminil 的,假设你想要 names 和它们各自的 val:

使用 dictitemgetter :

from operator import itemgetter

def getSoundexDict(dList):
return sorted(dict_.items(), key=itemgetter(1)) # sorting the dict_ on val

dataList = ['two','fourth','forth','dessert','to','desert']
res = [soundex(x) for x in dataList] # to get the val for each elem
dict_ = dict(list(zip(dataList, res))) # dict_ with k,v as name/val

print([list(g) for _, g in groupby(getSoundexDict(dataList), lambda x: x[1])])

输出:

[[('dessert', 'D263'), ('desert', 'D263')], [('fourth', 'F630'), ('forth', 'F630')], [('two', 'T000'), ('to', 'T000')]]

EDIT 4(用于 OP):

Soundex:

Soundex is a system whereby values are assigned to names in such a manner that similar-sounding names get the same value. These values are known as soundex encodings. A search application based on soundex will not search for a name directly but rather will search for the soundex encoding. By doing so, it will obtain all names that sound like the name being sought.

read more..

关于python - 如何将发音相似的词放在一起,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55331723/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com