gpt4 book ai didi

python - 属性错误 : 'spacy.tokens.span.Span' object has no attribute 'merge'

转载 作者:行者123 更新时间:2023-12-05 02:01:16 29 4
gpt4 key购买 nike

我正在处理一个 nlp 项目并尝试遵循本教程 https://medium.com/@ageitgey/natural-language-processing-is-fun-9a0bff37854e并在执行这部分时

import spacy

# Load the large English NLP model
nlp = spacy.load('en_core_web_lg')

# Replace a token with "REDACTED" if it is a name
def replace_name_with_placeholder(token):
if token.ent_iob != 0 and token.ent_type_ == "PERSON":
return "[REDACTED] "
else:
return token.string

# Loop through all the entities in a document and check if they are names
def scrub(text):
doc = nlp(text)
for ent in doc.ents:
ent.merge()
tokens = map(replace_name_with_placeholder, doc)
return "".join(tokens)

s = """
In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence".
In 1957, Noam Chomsky’s
Syntactic Structures revolutionized Linguistics with 'universal grammar', a rule based system of
syntactic structures.
"""

print(scrub(s))

出现这个错误

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-62-ab1c786c4914> in <module>
4 """
5
----> 6 print(scrub(s))

<ipython-input-60-4742408aa60f> in scrub(text)
3 doc = nlp(text)
4 for ent in doc.ents:
----> 5 ent.merge()
6 tokens = map(replace_name_with_placeholder, doc)
7 return "".join(tokens)

AttributeError: 'spacy.tokens.span.Span' object has no attribute 'merge'

最佳答案

自该教程制作以来,Spacy 就取消了 span.merge() 方法。现在执行此操作的方法是使用 doc.retokenize():https://spacy.io/api/doc#retokenize .我在下面为您的 scrub 函数实现了它:

# Loop through all the entities in a document and check if they are names
def scrub(text):
doc = nlp(text)
with doc.retokenize() as retokenizer:
for ent in doc.ents:
retokenizer.merge(ent)
tokens = map(replace_name_with_placeholder, doc)
return "".join(tokens)

s = """
In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence".
In 1957, Noam Chomsky’s
Syntactic Structures revolutionized Linguistics with 'universal grammar', a rule based system of
syntactic structures.
"""

print(scrub(s))

其他说明:

  1. 您的replace_name_with_placeholder 函数会抛出错误,请改用token.text,我在下面修复了它:

     def replace_name_with_placeholder(token):
    if token.ent_iob != 0 and token.ent_type_ == "PERSON":
    return "[REDACTED] "
    else:
    return token.text
  2. 如果您正在提取实体,此外还有其他跨度,如 doc.noun_chunks,您可能会遇到一些问题,例如这个问题:

     ValueError: [E102] Can't merge non-disjoint spans. 'Computing' is already part of 
    tokens to merge. If you want to find the longest non-overlapping spans, you can
    use the util.filter_spans helper:
    https://spacy.io/api/top-level#util.filter_spans

    出于这个原因,您可能还需要查看 spacy.util.filter_spans: https://spacy.io/api/top-level#util.filter_spans .

关于python - 属性错误 : 'spacy.tokens.span.Span' object has no attribute 'merge' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66725902/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com