gpt4 book ai didi

python - IOB 准确度和精度之间的差异

转载 作者:行者123 更新时间:2023-12-05 01:08:20 24 4
gpt4 key购买 nike

我正在使用命名实体识别和分 block 器在 NLTK 上做一些工作。我使用 nltk/chunk/named_entity.py 重新训练了一个分类器为此,我得到了以下措施:

ChunkParse score:
IOB Accuracy: 96.5%
Precision: 78.0%
Recall: 91.9%
F-Measure: 84.4%

但我不明白在这种情况下 IOB Accuracy 和 Precision 之间的确切区别是什么。实际上,我在文档 ( here) 中找到了以下具体示例:

The IOB tag accuracy indicates that more than a third of the words are tagged with O, i.e. not in an NP chunk. However, since our tagger did not find any chunks, its precision, recall, and f-measure are all zero.



那么,如果 IOB 准确度只是 O 标签的数量,那么在那个例子中,为什么我们没有 block 并且 IOB 准确度不是 100% 同时呢?

先感谢您

最佳答案

wikipedia(见 https://en.wikipedia.org/wiki/Accuracy_and_precision)上对精度和准确度之间的区别有非常详细的解释,简而言之:

accuracy = (tp + tn) / (tp + tn + fp + fn)
precision = tp / tp + fp

回到 NLTK,有一个模块调用 ChunkScore计算 accuracy , precisionrecall你的系统。这是 NLTK 计算 tp,fp,tn,fn 的有趣部分。对于 accuracyprecision ,它以不同的粒度进行。

对于 精度 , NLTK 计算用 POS 标签和 IOB 标签正确猜测的 token 总数 ( NOT CHUNKS!! ),然后除以金句中的 token 总数。
accuracy = num_tokens_correct / total_num_tokens_from_gold

对于 精度 召回 , NLTK 计算:
  • True Positives通过计算正确猜测的 block 数( NOT TOKENS!!! )
  • False Positives通过计算被猜测但错误的 block ( NOT TOKENS!!! )的数量。
  • True Negatives通过计算系统未猜到的 block 数( NOT TOKENS!!! )。

  • 然后计算精度和召回率:
    precision = tp / fp + tp
    recall = tp / fn + tp

    为了证明以上几点,试试这个脚本:
    from nltk.chunk import *
    from nltk.chunk.util import *
    from nltk.chunk.regexp import *
    from nltk import Tree
    from nltk.tag import pos_tag

    # Let's say we give it a rule that says anything with a [DT NN] is an NP
    chunk_rule = ChunkRule("<DT>?<NN.*>", "DT+NN* or NN* chunk")
    chunk_parser = RegexpChunkParser([chunk_rule], chunk_node='NP')

    # Let's say our test sentence is:
    # "The cat sat on the mat the big dog chewed."
    gold = tagstr2tree("[ The/DT cat/NN ] sat/VBD on/IN [ the/DT mat/NN ] [ the/DT big/JJ dog/NN ] chewed/VBD ./.")

    # We POS tag the sentence and then chunk with our rule-based chunker.
    test = pos_tag('The cat sat on the mat the big dog chewed .'.split())
    chunked = chunk_parser.parse(test)

    # Then we calculate the score.
    chunkscore = ChunkScore()
    chunkscore.score(gold, chunked)
    chunkscore._updateMeasures()

    # Our rule-based chunker says these are chunks.
    chunkscore.guessed()

    # Total number of tokens from test sentence. i.e.
    # The/DT , cat/NN , on/IN , sat/VBD, the/DT , mat/NN ,
    # the/DT , big/JJ , dog/NN , chewed/VBD , ./.
    total = chunkscore._tags_total
    # Number of tokens that are guessed correctly, i.e.
    # The/DT , cat/NN , on/IN , the/DT , mat/NN , chewed/VBD , ./.
    correct = chunkscore._tags_correct
    print "Is correct/total == accuracy ?", chunkscore.accuracy() == (correct/total)
    print correct, '/', total, '=', chunkscore.accuracy()
    print "##############"

    print "Correct chunk(s):" # i.e. True Positive.
    correct_chunks = set(chunkscore.correct()).intersection(set(chunkscore.guessed()))
    ##print correct_chunks
    print "Number of correct chunks = tp = ", len(correct_chunks)
    assert len(correct_chunks) == chunkscore._tp_num
    print

    print "Missed chunk(s):" # i.e. False Negative.
    ##print chunkscore.missed()
    print "Number of missed chunks = fn = ", len(chunkscore.missed())
    assert len(chunkscore.missed()) == chunkscore._fn_num
    print

    print "Wrongly guessed chunk(s):" # i.e. False positive.
    wrong_chunks = set(chunkscore.guessed()).difference(set(chunkscore.correct()))
    ##print wrong_chunks
    print "Number of wrong chunks = fp =", len(wrong_chunks)
    print chunkscore._fp_num
    assert len(wrong_chunks) == chunkscore._fp_num
    print

    print "Recall = ", "tp/fn+tp =", len(correct_chunks), '/', len(correct_chunks)+len(chunkscore.missed()),'=', chunkscore.recall()

    print "Precision =", "tp/fp+tp =", len(correct_chunks), '/', len(correct_chunks)+len(wrong_chunks), '=', chunkscore.precision()

    关于python - IOB 准确度和精度之间的差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17325554/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com