gpt4 book ai didi

python - 识别不能在列表项中拼写的字符串

转载 作者:太空宇宙 更新时间:2023-11-04 01:21:21 24 4
gpt4 key购买 nike

我有一个 list

['mPXSz0qd6j0 youtube ', 'lBz5XJRLHQM youtube ', 'search OpHQOO-DwlQ ', 
'sachin 47427243 ', 'alex smith ', 'birthday JEaM8Lg9oK4 ',
'nebula 8x41n9thAU8 ', 'chuck norris ',
'searcher O6tUtqPcHDw ', 'graham wXqsg59z7m0 ', 'queries K70QnTfGjoM ']

有没有办法识别列表项中不能拼写的字符串并将其删除?

最佳答案

您可以使用,例如PyEnchant用于基本字典检查和 NLTK考虑小的拼写问题,像这样:

import enchant
import nltk

spell_dict = enchant.Dict('en_US') # or whatever language supported

def get_distance_limit(w):
'''
The word is considered good
if it's no further from a known word than this limit.
'''
return len(w)/5 + 2 # just for example, allowing around 1 typo per 5 chars.

def check_word(word):
if spell_dict.check(word):
return True # a known dictionary word

# try similar words
max_dist = get_distance_limit(word)
for suggestion in spell_dict.suggest(word):
if nltk.edit_distance(suggestion, word) < max_dist:
return True

return False

添加大小写归一化和数字过滤器,您将获得非常好的启发式方法。

关于python - 识别不能在列表项中拼写的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20969843/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com