nlp - 名词和名词 block 的空间词形还原-6ren

nlp - 名词和名词 block 的空间词形还原

转载作者：行者123 更新时间：2023-12-04 13:29:02

我正在尝试创建一个由词形还原名词和名词块组成的文档语料库。我正在使用此代码:

import spacy
nlp = spacy.load('en_core_web_sm')

def lemmatizer(doc, allowed_postags=['NOUN']):                                                     
    doc = [token.lemma_ for token in doc if token.pos_ in allowed_postags]
    doc = u' '.join(doc)
    return nlp.make_doc(doc)


nlp.add_pipe(nlp.create_pipe('merge_noun_chunks'), after='ner')
nlp.add_pipe(lemmatizer, name='lemm', after='merge_noun_chunks')

doc_list = []                                                                                      
for doc in data:                                                                                    
    pr = nlp(doc)
    doc_list.append(pr)

句子 'the euro area has advanced a long way as a monetary union'识别名词块后 ['the euro area', 'advanced', 'long', 'way', 'a monetary union']和词形还原得到: ['euro', 'area', 'way', 'monetary', 'union'] .
有没有办法将识别出的名词块的单词组合起来得到这样的输出: ['the euro area','way', 'a monetary union']或 ['the_euro_area','way', 'a_monetary_union'] ?
谢谢你的帮助!

最佳答案

我不认为你的问题是关于词形还原。
此方法适用于您的示例。

# merge noun phrase and entities
def merge_noun_phrase(doc):
    spans = list(doc.ents) + list(doc.noun_chunks)
    spans = spacy.util.filter_spans(spans)
    
    with doc.retokenize() as retokenizer:
        for span in spans:
            retokenizer.merge(span)
    return doc

sentence = "the euro area has advanced a long way as a monetary union"
doc = nlp(sentence)
doc2 = merge_noun_phrase(doc)
for token in doc2:
    print(token)
    #['the euro area', 'way', 'a monetary union']

我必须注意，我使用的是 spacy2.3.5，也许是 spacy.util.filter_spans在最新版本中已弃用。这个答案会帮助你。 :)
Module 'spacy.util' has no attribute 'filter_spans'
而且，如果您仍然尝试对名词块进行词形还原，您可以按以下方式进行:

doc = nlp("the euro area has advanced a long way as a monetary union")
for chunk in doc.noun_chunks:
    print(chunk.lemma_)
    #['the euro area', 'a monetary union']

根据 What is the lemma for 'two pets'中的回答，“在跨度级别查看引理可能不是很有用，在 token 级别上工作更有意义。”

关于nlp - 名词和名词 block 的空间词形还原，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66332810/