gpt4 book ai didi

Elasticsearch:获取给定文档中的短语频率

转载 作者:行者123 更新时间:2023-11-29 02:46:53 26 4
gpt4 key购买 nike

测试数据:

curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '{ "body": "this is a test" }'
curl -XPUT 'localhost:9200/customer/external/2?pretty' -d '{ "body": "and this is another test" }'
curl -XPUT 'localhost:9200/customer/external/2?pretty' -d '{ "body": "this thing is a test" }'

我的目标是获取文档中短语的出现频率。

我知道如何获取文档中术语的出现频率:

curl -g "http://localhost:9200/customer/external/1/_termvectors?pretty" -d'
{
"fields": ["body"],
"term_statistics" : true
}'

而且我知道如何计算包含给定短语的文档(使用 match_phrase 或 span_near 查询):

curl -g "http://localhost:9200/customer/_count?pretty" -d'
{
"query": {
"match_phrase": {
"body" : "this is"
}
}
}'

如何获取短语的频率?

最佳答案

您可以使用术语向量。正如写在documentation

Return values edit

Three types of values can be requested: term information, termstatistics and field statistics. By default, all term information andfield statistics are returned for all fields but no term statistics.Term information edit

term frequency in the field (always returned)
term positions (positions : true)
start and end offsets (offsets : true)
term payloads (payloads : true), as base64 encoded bytes

您必须达到术语频率 - 在示例中,您可以看到文档中有 john doe 的频率。注意termvector复制了它所应用的字段的磁盘空间占用

关于Elasticsearch:获取给定文档中的短语频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46569177/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com