gpt4 book ai didi

elasticsearch - “United States”不是[“United”,“States”]

转载 作者:行者123 更新时间:2023-12-03 01:35:35 29 4
gpt4 key购买 nike

我在elasticsearch中有文本字段,我想在kibana上可视化词云...

第一步,我们需要标记它们,我使用了“标准标记器” ...
使用这种形式的词云可视化结果如下图所示:
you see untied states divided into united and states

但是我需要的是专有名词,例如“United States”,“United Nations”,“Security Council”和...一定不能脱离,我希望这样的词云:
enter image description here
*专有名词或短语几乎在2-5个词之间。 (例如“中华人民共和国”)

我该怎么办?
这与N-Gram相关吗?

示例文本:

The United States of America is a charter member of the United Nations and one of five permanent members of the UN Security Council.

The United States is host to the headquarters of the United Nations, which includes the usual meeting place of the General Assembly in New York City, the seat of the Security Council and several bodies of the United Nations. The United States is the largest provider of financial contributions to the United Nations, providing 22 percent of the entire UN budget in 2017 (in comparison the next biggest contributor is Japan with almost 10 percent, while EU countries pay a total of above 30 percent).1 From July 2016 to June 2017, 28.6 percent of the budget used for peacekeeping operations was provided by the United States.2 The United States had a pivotal role in establishing the UN.

最佳答案

此任务是 NER 任务,而不是标准标记化任务。有一些插件可以通过 flex 来做到这一点,但是没有一个有希望。

为此,您需要在应用程序端预处理数据。使用NLP解析器(Standford Core NLP,Spacy ...)并提取命名实体。在映射中创建一个关键字字段(例如,将其称为实体),在其中将从每个文档中提取的实体保存为数组,然后可以使用此字段生成词云。

祝好运。

关于elasticsearch - “United States”不是[“United”,“States”],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52912974/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com