gpt4 book ai didi

machine-learning - k 折叠验证在 POS 标签上下文中意味着什么?

转载 作者:行者123 更新时间:2023-11-30 09:01:56 24 4
gpt4 key购买 nike

我知道对于 k 交叉验证,我应该将语料库分成 k 个相等的部分。在这 k 个部分中,我将使用 k-1 个部分进行训练,剩余 1 个部分进行测试。此过程将重复 k 次,以便每个部件都用于测试一次。

但我不明白训练究竟意味着什么测试到底意味着什么

我的想法是(如果我错了,请纠正我):
1. 训练集(k-1 of k):这些集将用于构建标签转换概率和发射概率表。然后,使用这些概率表应用一些算法进行标记(例如维特比算法)
2. 测试集(1 套):使用剩余的 1 套来验证步骤 1 中完成的实现。也就是说,该套来自语料库将包含未标记的单词,我应该在该集合上使用步骤 1 实现。

我的理解正确吗?如果没有请解释一下。

谢谢。

最佳答案

我希望这有帮助:

from nltk.corpus import brown
from nltk import UnigramTagger as ut

# Let's just take the first 100 sentences.
sents = brown.tagged_sents()[:1000]
num_sents = len(sents)
k = 10
foldsize = num_sents/10

fold_accurracies = []

for i in range(10):
# Locate the test set in the fold.
test = sents[i*foldsize:i*foldsize+foldsize]
# Use the rest of the sent not in test for training.
train = sents[:i*foldsize] + sents[i*foldsize+foldsize:]
# Trains a unigram tagger with the train data.
tagger = ut(train)
# Evaluate the accuracy using the test data.
accuracy = tagger.evaluate(test)
print "Fold", i
print 'from sent', i*foldsize, 'to', i*foldsize+foldsize
print 'accuracy =', accuracy
print
fold_accurracies.append(accuracy)

print 'average accuracy =', sum(fold_accurracies)/k

[输出]:

Fold 0
from sent 0 to 100
accuracy = 0.785714285714

Fold 1
from sent 100 to 200
accuracy = 0.745431364216

Fold 2
from sent 200 to 300
accuracy = 0.749628896586

Fold 3
from sent 300 to 400
accuracy = 0.743798291989

Fold 4
from sent 400 to 500
accuracy = 0.803448275862

Fold 5
from sent 500 to 600
accuracy = 0.779836277467

Fold 6
from sent 600 to 700
accuracy = 0.772676371781

Fold 7
from sent 700 to 800
accuracy = 0.755679184052

Fold 8
from sent 800 to 900
accuracy = 0.706402915148

Fold 9
from sent 900 to 1000
accuracy = 0.774622079707

average accuracy = 0.761723794252

关于machine-learning - k 折叠验证在 POS 标签上下文中意味着什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25106997/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com