gpt4 book ai didi

python - 如何计算 POS 标注器的标注精度和召回率?

转载 作者:太空狗 更新时间:2023-10-30 00:00:32 24 4
gpt4 key购买 nike

我正在使用一些基于规则和统计的词性标注器来用词性 (POS) 标记一个语料库(大约 5000 个句子)。以下是我的测试语料库的一个片段,其中每个单词都由其各自的 POS 标签以“/”分隔。

No/RB ,/, it/PRP was/VBD n't/RB Black/NNP Monday/NNP ./.
But/CC while/IN the/DT New/NNP York/NNP Stock/NNP Exchange/NNP did/VBD n't/RB fall/VB apart/RB Friday/NNP as/IN the/DT Dow/NNP Jones/NNP Industrial/NNP Average/NNP plunged/VBD 190.58/CD points/NNS --/: most/JJS of/IN it/PRP in/IN the/DT final/JJ hour/NN --/: it/PRP barely/RB managed/VBD *-2/-NONE- to/TO stay/VB this/DT side/NN of/IN chaos/NN ./.
Some/DT ``/`` circuit/NN breakers/NNS ''/'' installed/VBN */-NONE- after/IN the/DT October/NNP 1987/CD crash/NN failed/VBD their/PRP$ first/JJ test/NN ,/, traders/NNS say/VBP 0/-NONE- *T*-1/-NONE- ,/, *-2/-NONE- unable/JJ *-3/-NONE- to/TO cool/VB the/DT selling/NN panic/NN in/IN both/DT stocks/NNS and/CC futures/NNS ./.

标注完语料库后是这样的:

No/DT ,/, it/PRP was/VBD n't/RB Black/NNP Monday/NNP ./. 
But/CC while/IN the/DT New/NNP York/NNP Stock/NNP Exchange/NNP did/VBD n't/RB fall/VB apart/RB Friday/VB as/IN the/DT Dow/NNP Jones/NNP Industrial/NNP Average/JJ plunged/VBN 190.58/CD points/NNS --/: most/RBS of/IN it/PRP in/IN the/DT final/JJ hour/NN --/: it/PRP barely/RB managed/VBD *-2/-NONE- to/TO stay/VB this/DT side/NN of/IN chaos/NNS ./.
Some/DT ``/`` circuit/NN breakers/NNS ''/'' installed/VBN */-NONE- after/IN the/DT October/NNP 1987/CD crash/NN failed/VBD their/PRP$ first/JJ test/NN ,/, traders/NNS say/VB 0/-NONE- *T*-1/-NONE- ,/, *-2/-NONE- unable/JJ *-3/-NONE- to/TO cool/VB the/DT selling/VBG panic/NN in/IN both/DT stocks/NNS and/CC futures/NNS ./.

我需要计算标记准确度(Tag wise- Recall & Precision),因此需要在标记每个词标记对时找出错误(如果有的话)。

我正在考虑的方法是遍历这两个文本文件并将它们存储在一个列表中,然后逐个元素地比较“两个”列表。

这个方法对我来说似乎很粗糙,所以希望你们提出一些更好的解决上述问题的方法。

来自wikipedia页:

In a classification task, the precision for a class is the number of true positives (i.e. the number of items correctly labeled as belonging to the positive class) divided by the total number of elements labeled as belonging to the positive class (i.e. the sum of true positives and false positives, which are items incorrectly labeled as belonging to the class). Recall in this context is defined as the number of true positives divided by the total number of elements that actually belong to the positive class (i.e. the sum of true positives and false negatives, which are items which were not labeled as belonging to the positive class but should have been).

最佳答案

请注意,由于每个单词都只有一个标签,因此总体召回率和准确率分数对于这项任务没有意义(它们都等于准确率指标)。但要求每个标签的召回率和准确率测量确实有意义 - 例如,您可以找到 DT 标签的召回率和准确率。

一次对所有标签执行此操作的最有效方法与您建议的方法类似,但您可以通过跳过列表制作阶段来节省一次数据传递。读入每个文件的一行,逐字比较这两行,并重复直到到达文件末尾。对于每个单词比较,您可能想要检查单词是否相等,而不是假设两个文件是同步的。对于每种标签,您都保留三个运行总数:真阳性、假阳性和假阴性。如果当前单词的两个标签匹配,则增加标签的真阳性总数。如果它们不匹配,您需要增加真实标签的假阴性总数和机器错误选择的标签的假阳性总数。最后,您可以按照维基百科摘录中的公式计算每个标签的召回率和准确率分数。

我还没有测试过这段代码,我的 Python 也有点生疏,但这应该能让你明白。我假设文件是​​打开的并且 totals 数据结构是字典的字典:

finished = false
while not finished:
trueLine = testFile.readline()
if not trueLine: # end of file
finished = true
else:
trueLine = trueLine.split() # tokenise by whitespace
taggedLine = taggedFile.readline()
if not taggedLine:
print 'Error: files are out of sync.'
taggedLine = taggedLine.split()
if len(trueLine) != len(taggedLine):
print 'Error: files are out of sync.'
for i in range(len(trueLine)):
truePair = trueLine[i].split('/')
taggedPair = taggedLine[i].split('/')
if truePair[0] != taggedPair[0]: # the words should match
print 'Error: files are out of sync.'
trueTag = truePair[1]
guessedTag = taggedPair[1]
if trueTag == guessedTag:
totals[trueTag]['truePositives'] += 1
else:
totals[trueTag]['falseNegatives'] += 1
totals[guessedTag]['falsePositives'] += 1

关于python - 如何计算 POS 标注器的标注精度和召回率?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5264492/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com