gpt4 book ai didi

python - spaCy分类器: 'unicode' object has no attribute 'to_array'

转载 作者:行者123 更新时间:2023-12-01 09:26:34 25 4
gpt4 key购买 nike

我正在尝试使用 spaCy 编写一个最小的文本分类器。我编写了以下代码片段来仅训练文本分类器(无需训练整个 NLP 管道):

import spacy
from spacy.pipeline import TextCategorizer
nlp = spacy.load('en')

doc1 = u'This is my first document in the dataset.'
doc2 = u'This is my second document in the dataset.'

gold1 = u'Category1'
gold2 = u'Category2'

textcat = TextCategorizer(nlp.vocab)
textcat.add_label('Category1')
textcat.add_label('Category2')
losses = {}
optimizer = textcat.begin_training()
textcat.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)

但是当我运行它时,它返回一个错误。这是我启动时它给我的回溯:

Traceback (most recent call last):
File "C:\Users\Reuben\Desktop\Classification\Classification\Training.py", line
16, in <module>
textcat.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)
File "pipeline.pyx", line 838, in spacy.pipeline.TextCategorizer.update
File "D:\Program Files\Anaconda2\lib\site-packages\thinc\api.py", line 61, in
begin_update
X, inc_layer_grad = layer.begin_update(X, drop=drop)
File "D:\Program Files\Anaconda2\lib\site-packages\thinc\api.py", line 176, in
begin_update
values = [fwd(X, *a, **k) for fwd in forward]
File "D:\Program Files\Anaconda2\lib\site-packages\thinc\api.py", line 258, in
wrap
output = func(*args, **kwargs)
File "D:\Program Files\Anaconda2\lib\site-packages\thinc\api.py", line 61, in
begin_update
X, inc_layer_grad = layer.begin_update(X, drop=drop)
File "D:\Program Files\Anaconda2\lib\site-packages\spacy\_ml.py", line 95, in
_preprocess_doc
keys = [doc.to_array(LOWER) for doc in docs]
AttributeError: 'unicode' object has no attribute 'to_array'

我该如何解决这个问题?

最佳答案

显然 textcat 需要使用 GoldParse 生成的黄金值,而不是纯文本值。工作版本如下所示:

import spacy
from spacy.pipeline import TextCategorizer
from spacy.gold import GoldParse
nlp = spacy.load('en')

doc1 = nlp(u'This is my first document in the dataset.')
doc2 = nlp(u'This is my second document in the dataset.')

gold1 = GoldParse(doc=doc1, cats={'Category1': 1, 'Category2': 0})
gold2 = GoldParse(doc=doc2, cats={'Category1': 0, 'Category2': 1})

textcat = TextCategorizer(nlp.vocab)
textcat.add_label('Category1')
textcat.add_label('Category2')
losses = {}
optimizer = textcat.begin_training()
textcat.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer)

感谢评论中的 @abarnert 帮助我调试此问题。

关于python - spaCy分类器: 'unicode' object has no attribute 'to_array' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50340611/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com