gpt4 book ai didi

elasticsearch - 如何获得 elasticsearch 文档中每个单词的总数?

转载 作者:行者123 更新时间:2023-11-29 02:51:43 24 4
gpt4 key购买 nike

我搜索了这个问题,但找不到任何有用的答案。我想获得文档中每个单词的总数,例如我的索引中有一些推文,并且有一条推文说的是“这里太无聊了,我想回到我甜蜜的家”。查询应返回如下响应:

It:1
is:1
so:1
boring:1
here:1
I:1
want:1
to:2
go:1
my:1
home:2
sweet:1

这有可能吗?

最佳答案

您正在寻找 term vectors ,它利用了分析器。在这样做的同时,您可以定义您需要的任何分析器,即词干分析器将单词转换为根/正常形式。看看documentation了解更多详情。

在:

POST so/_close
PUT so/_settings
{
"settings": {
"analysis":{
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "my_stemmer"]
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"name": "english"
}
}
}
}
}
POST so/_open
PUT so/t1/_mapping
{
"t1": {
"properties": {
"tweet": {
"type": "string",
"store": true,
"index_analyzer": "my_analyzer"
}
}
}
}
POST so/t1/1
{"tweet": "It is so boring here I want to go to my home sweet home. So I'm bored"}

输出:

{
"_index": "so",
"_type": "t1",
"_id": "1",
"_version": 2,
"found": true,
"term_vectors": {
"tweet": {
"field_statistics": {
"sum_doc_freq": 13,
"doc_count": 1,
"sum_ttf": 17
},
"terms": {
"bore": {
"term_freq": 2,
...
},
"go": {
"term_freq": 1,
...
},
"here": {
"term_freq": 1,
...
},
"home": {
"term_freq": 2,
...
},
"i": {
"term_freq": 1,
...
},
"i'm": {
"term_freq": 1,
...
},
"is": {
"term_freq": 1,
...
},
"it": {
"term_freq": 1,
...
},
"my": {
"term_freq": 1,
...
},
"so": {
"term_freq": 2,
...
},
"sweet": {
"term_freq": 1,
...
},
"to": {
"term_freq": 2,
...
},
"want": {
"term_freq": 1,
...
}
}
}
}
}

关于elasticsearch - 如何获得 elasticsearch 文档中每个单词的总数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31913808/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com