gpt4 book ai didi

python - 二元概率

转载 作者:行者123 更新时间:2023-12-04 09:30:40 25 4
gpt4 key购买 nike

我有一个 Moby Dick 语料库,我需要计算二元组“象牙腿”的概率。
我知道这个命令给了我所有二元组的列表

bigrams = [w1+" "+w2 for w1,w2 in zip(words[:-1], words[1:])]
但是我如何获得这两个词的概率呢?

最佳答案

您可以计算所有二元组并计算您要查找的特定二元组。 bigram 出现的概率 P(bigram) 就是那些的商。 word[1] 给 word[0] 的条件概率 P(w[1] | w[0]) 是双元组出现次数与 w[0] 计数的商。例如看二元组 ('some', 'text') :

s = 'this is some text about some text but not some other stuff'.split()

bigrams = [(s1, s2) for s1, s2 in zip(s, s[1:])]

# [('this', 'is'),
# ('is', 'some'),
# ('some', 'text'),
# ('text', 'about'),
# ...

number_of_bigrams = len(bigrams)
# 11

# how many times 'some' occurs
some_count = s.count('some')
# 3

# how many times bigram occurs
bg_count = bigrams.count(('some', 'text'))
# 2

# probabily of 'text' given 'some' P(bigram | some)
# i.e. you found `some`, what's the probability that its' makes the bigram:
bg_count/some_count
# 0.666

# probabilty of bigram in text P(some text)
# i.e. pick a bigram at random, what's the probability it's your bigram:
bg_count/number_of_bigrams
# 0.181818

关于python - 二元概率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62867820/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com