python - 如何在python中生成一组相似的字符串-6ren

python - 如何在python中生成一组相似的字符串

转载作者：太空宇宙更新时间：2023-11-04 05:20:55

我想知道如何根据 Levenshtein 距离(字符串编辑距离)生成一组相似的字符串。理想情况下，我喜欢传入一个源字符串(即用于生成与其相似的其他字符串的字符串)，需要生成的字符串数量和一个阈值作为参数，即字符串之间的相似性生成集应该大于阈值。我想知道我应该使用什么 Python 包来实现它？或者有什么想法可以实现吗？

最佳答案

我觉得你可以换个角度想问题(反过来)。

给定一个字符串，说它是sittin。
给定一个阈值(编辑距离)，假设它是k。
然后您在 k 步中应用不同“编辑”的组合。

例如，假设 k = 2。并假设允许的 edit modes你有:

删除一个字符
添加一个字符
将一个字符替换为另一个字符。

那么逻辑是这样的:

input = 'sittin'
for num in 1 ... n:  # suppose you want to have n strings generated
  my_input_ = input
  # suppose the edit distance should be smaller or equal to k;
  # but greater or equal to one
  for i in in 1 ... randint(k): 
    pick a random edit mode from (delete, add, substitute)
    do it! and update my_input_

如果您需要坚持使用预定义的字典，这会增加一些复杂性，但它仍然可行。在这种情况下，编辑必须有效。

关于python - 如何在python中生成一组相似的字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40358855/