gpt4 book ai didi

python - NLTK 实体提取从 NLTK 2.0.4 到 NLTK 3.0 的差异

转载 作者:太空宇宙 更新时间:2023-11-03 18:08:50 27 4
gpt4 key购买 nike

我在尝试运行实体提取功能时遇到问题。我相信这是版本差异。以下工作示例在 2.0.4 中运行,但不在 3.0 中运行。我确实将一个函数调用:batch_ne_chunk 更改为:nltk.ne_chunk_sents,以防止在 3.0 中抛出错误。

def package_get_entities(self,text):
#text = text[0:300]
entity_names = []
chunked = self.get_chunked_sentences(text)
for tree in chunked:
entity_names.extend(self.extract_entity_names(tree))
entity_names = list(set(entity_names))
return entity_names

def get_chunked_sentences(self,text):
sentences = nltk.sent_tokenize(text)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
return chunked_sentences

def extract_entity_names(self,t):
entity_names = []
if hasattr(t, 'node') and t.node:
if t.node == 'NE':
entity_names.append(' '.join([child[0] for child in t]))
else:
for child in t:
entity_names.extend(self.extract_entity_names(child))
return entity_names

运行函数:

str = 'this is some text about a man named Abraham Lincoln'
entArray = package_get_entities(str)

在 2.0.4 输出中 [亚伯拉罕·林肯]在 3.0 中输出 []

最佳答案

我必须重写:

if hasattr(t, 'node') and t.node:

致:

if hasattr(t, 'label'):

关于python - NLTK 实体提取从 NLTK 2.0.4 到 NLTK 3.0 的差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26352041/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com