gpt4 book ai didi

python - Norvig的拼写检查器,他是如何实现条件概率的?

转载 作者:行者123 更新时间:2023-11-30 23:24:16 25 4
gpt4 key购买 nike

在定义条件概率时,他走了一条捷径:

So I took a shortcut: I defined a trivial model that says all known words of edit distance 1 are infinitely more probable than known words of edit distance 2, and infinitely less probable than a known word of edit distance 0. By "known word" I mean a word that we have seen in the language model training data -- a word in the dictionary. We can implement this strategy as follows:

def known(words): return set(w for w in words if w in NWORDS)
def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
return max(candidates, key=NWORDS.get)

我不明白这段代码是如何实现他的策略的。对我来说,返回的最后一行只是返回具有最高计数/先验的单词,而不是模型中的优先级列表。

以及定义他的字数统计词典:

def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model[f] += 1
return model

他为什么不从0开始呢?我的意思是 default_factory 不应该是 (lambda:0) 或 (int) 吗?

谁能解释一下吗?您可以在这里找到完整的文章:http://norvig.com/spell-correct.html

谢谢

最佳答案

优先级列表由实现。如果known([word])是非空集,则其值是表达式的值。如果为空,则右侧

known(edits1(word)) or known_edits2(word) or [word]

已评估。例如

>>> [1, 2, 3] or [4, 5, 6]
[1, 2, 3]
>>> [] or [4, 5, 6]
[4, 5, 6]

Why didn't he start from 0?

那就是 Laplace smoothing 。其实文章里已经解释过了。

关于python - Norvig的拼写检查器,他是如何实现条件概率的?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23545901/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com