gpt4 book ai didi

python - NLTK 命名实体识别到 Python 列表

转载 作者:太空狗 更新时间:2023-10-29 17:21:16 24 4
gpt4 key购买 nike

我使用 NLTK 的 ne_chunk 从文本中提取命名实体:

my_sent = "WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement."


nltk.ne_chunk(my_sent, binary=True)

但我不知道如何将这些实体保存到列表中?例如。 –

print Entity_list
('WASHINGTON', 'New York', 'Loretta', 'Brooklyn', 'African')

谢谢。

最佳答案

nltk.ne_chunk 返回嵌套的 nltk.tree.Tree 对象,因此您必须遍历 Tree 对象才能到达 NE .

看看Named Entity Recognition with Regular Expression: NLTK

>>> from nltk import ne_chunk, pos_tag, word_tokenize
>>> from nltk.tree import Tree
>>>
>>> def get_continuous_chunks(text):
... chunked = ne_chunk(pos_tag(word_tokenize(text)))
... continuous_chunk = []
... current_chunk = []
... for i in chunked:
... if type(i) == Tree:
... current_chunk.append(" ".join([token for token, pos in i.leaves()]))
... if current_chunk:
... named_entity = " ".join(current_chunk)
... if named_entity not in continuous_chunk:
... continuous_chunk.append(named_entity)
... current_chunk = []
... else:
... continue
... return continuous_chunk
...
>>> my_sent = "WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement."
>>> get_continuous_chunks(my_sent)
['WASHINGTON', 'New York', 'Loretta E. Lynch', 'Brooklyn']


>>> my_sent = "How's the weather in New York and Brooklyn"
>>> get_continuous_chunks(my_sent)
['New York', 'Brooklyn']

关于python - NLTK 命名实体识别到 Python 列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31836058/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com