gpt4 book ai didi

python - 从头开始在 Python 中进行 Bleu 评分

转载 作者:太空宇宙 更新时间:2023-11-04 11:16:12 41 4
gpt4 key购买 nike

看完 Andrew Ng 关于 Bleu score 的视频后我想用 python 从头开始​​实现一个。我用 numpy 谨慎地用 python 编写了完整的代码。这是完整代码

import numpy as np

def n_gram_generator(sentence,n= 2,n_gram= False):
'''
N-Gram generator with parameters sentence
n is for number of n_grams
The n_gram parameter removes repeating n_grams
'''
sentence = sentence.lower() # converting to lower case
sent_arr = np.array(sentence.split()) # split to string arrays
length = len(sent_arr)

word_list = []
for i in range(length+1):
if i < n:
continue
word_range = list(range(i-n,i))
s_list = sent_arr[word_range]
string = ' '.join(s_list) # converting list to strings
word_list.append(string) # append to word_list
if n_gram:
word_list = list(set(word_list))
return word_list

def bleu_score(original,machine_translated):
'''
Bleu score function given a orginal and a machine translated sentences
'''
mt_length = len(machine_translated.split())
o_length = len(original.split())

# Brevity Penalty
if mt_length>o_length:
BP=1
else:
penality=1-(mt_length/o_length)
BP=np.exp(penality)

# calculating precision
precision_score = []
for i in range(mt_length):
original_n_gram = n_gram_generator(original,i)
machine_n_gram = n_gram_generator(machine_translated,i)
n_gram_list = list(set(machine_n_gram)) # removes repeating strings

# counting number of occurence
machine_score = 0
original_score = 0
for j in n_gram_list:
machine_count = machine_n_gram.count(j)
original_count = original_n_gram.count(j)
machine_score = machine_score+machine_count
original_score = original_score+original_count

precision = original_score/machine_score
precision_score.append(precision)
precisions_sum = np.array(precision_score).sum()
avg_precisions_sum=precisions_sum/mt_length
bleu=BP*np.exp(avg_precisions_sum)
return bleu

if __name__ == "__main__":
original = "this is a test"
bs=bleu_score(original,original)
print("Bleu Score Original",bs)

我试着用nltk的测试我的分数

from nltk.translate.bleu_score import sentence_bleu
reference = [['this', 'is', 'a', 'test']]
candidate = ['this', 'is', 'a', 'test']
score = sentence_bleu(reference, candidate)
print(score)

问题是我的 bleu 分数大约是 2.718281 而 nltk 的是 1。我究竟做错了什么?

以下是一些可能的原因:

1) 我根据机器翻译句子的长度计算了 ngrams。这里从 1 到 4

2) n_gram_generator 函数,我自己写的,不确定它的准确性

3) 我如何使用错误的函数或计算错误的 bleu 分数

有人可以查看我的代码并告诉我哪里出错了吗?

最佳答案

您的 bleu 分数计算有误。问题:

  • 你必须使用裁剪精度
  • sklearn 为每个 n 克使用权重
  • sklearn 对 n = 1,2,3,4 使用 ngrams

更正代码

def bleu_score(original,machine_translated):
'''
Bleu score function given a orginal and a machine translated sentences
'''
mt_length = len(machine_translated.split())
o_length = len(original.split())

# Brevity Penalty
if mt_length>o_length:
BP=1
else:
penality=1-(mt_length/o_length)
BP=np.exp(penality)

# Clipped precision
clipped_precision_score = []
for i in range(1, 5):
original_n_gram = Counter(n_gram_generator(original,i))
machine_n_gram = Counter(n_gram_generator(machine_translated,i))

c = sum(machine_n_gram.values())
for j in machine_n_gram:
if j in original_n_gram:
if machine_n_gram[j] > original_n_gram[j]:
machine_n_gram[j] = original_n_gram[j]
else:
machine_n_gram[j] = 0

#print (sum(machine_n_gram.values()), c)
clipped_precision_score.append(sum(machine_n_gram.values())/c)

#print (clipped_precision_score)

weights =[0.25]*4

s = (w_i * math.log(p_i) for w_i, p_i in zip(weights, clipped_precision_score))
s = BP * math.exp(math.fsum(s))
return s

original = "It is a guide to action which ensures that the military alwasy obeys the command of the party"
machine_translated = "It is the guiding principle which guarantees the military forces alwasy being under the command of the party"

print (bleu_score(original, machine_translated))
print (sentence_bleu([original.split()], machine_translated.split()))

输出:

0.27098211583470044
0.27098211583470044

关于python - 从头开始在 Python 中进行 Bleu 评分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56968434/

41 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com