gpt4 book ai didi

具有分层类别、子类别的 Elasticsearch 聚合;限制水平

转载 作者:行者123 更新时间:2023-12-02 22:16:29 25 4
gpt4 key购买 nike

我有类别字段的产品。使用聚合,我可以获得包含所有子类别的完整类别。我想限制 facet 中的级别。

例如我有这样的方面:

auto, tools & travel    (115)
auto, tools & travel > luggage tags (90)
auto, tools & travel > luggage tags > luggage spotters (40)
auto, tools & travel > luggage tags > something else (50)
auto, tools & travel > car organizers (25)

像这样使用聚合

"aggs": {
"cat_groups": {
"terms": {
"field": "categories.keyword",
"size": 10,
"include": "auto, tools & travel > .*"
}
}
}

我得到像这样的水桶

"buckets": [
{
"auto, tools & travel > luggage tags",
"doc_count": 90
},
{
"key": "auto, tools & travel > luggage tags > luggage spotters",
"doc_count": 40
},
{
"key": "auto, tools & travel > luggage tags > something else",
"doc_count": 50
},
{
"key": "auto, tools & travel > car organizers",
"doc_count": 25
}
]

但是我想限制等级。例如我只想获取 auto, tools & travel > luggage tags 的结果。我怎样才能限制级别?顺便说一句,"exclude": ".* > .* > .*" 对我不起作用。

我需要根据搜索获取不同级别的桶。有时是一级,有时是二级或三级。当我想要第一层时,我不希望第二层出现在桶上;以此类推。

Elasticsearch 6.4 版

最佳答案

我终于想出了下面的技术。

我已经使用 Path Hierarchy Tokenizer 实现了一个自定义分析器并且我创建了名为 categories 的多字段,以便您可以使用 categories.facets 进行聚合/构面,并使用 categories 进行普通文本搜索。

自定义分析器只适用于categories.facets

请注意我的字段 categories.facet

的属性 "fielddata": "true"

映射

PUT myindex
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "path_hierarchy",
"delimiter": ">"
}
}
}
},
"mappings": {
"mydocs": {
"properties": {
"categories": {
"type": "text",
"fields": {
"facet": {
"type": "text",
"analyzer": "my_analyzer",
"fielddata": "true"
}
}
}
}
}
}
}

示例文档

POST myindex/mydocs/1
{
"categories" : "auto, tools & travel > luggage tags > luggage spotters"
}

POST myindex/mydocs/2
{
"categories" : "auto, tools & travel > luggage tags > luggage spotters"
}

POST myindex/mydocs/3
{
"categories" : "auto, tools & travel > luggage tags > luggage spotters"
}

POST myindex/mydocs/4
{
"categories" : "auto, tools & travel > luggage tags > something else"
}

查询

您可以尝试以下您正在寻找的查询。我再次实现了 Filter Aggregation因为您只需要特定的单词以及 Terms Aggregation .

{
"size": 0,
"aggs":{
"facets": {
"filter": {
"bool": {
"must": [
{ "match": { "categories": "luggage"} }
]
}
},
"aggs": {
"categories": {
"terms": {
"field": "categories.facet"
}
}
}
}
}
}

响应

{
"took": 43,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"facets": {
"doc_count": 4,
"categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "auto, tools & travel ",
"doc_count": 4
},
{
"key": "auto, tools & travel > luggage tags ",
"doc_count": 4
},
{
"key": "auto, tools & travel > luggage tags > luggage spotters",
"doc_count": 3
},
{
"key": "auto, tools & travel > luggage tags > something else",
"doc_count": 1
}
]
}
}
}
}

聊天讨论后的最终答案

POST myindex/_search
{
"size": 0,
"aggs":{
"facets": {
"filter": {
"bool": {
"must": [
{ "match": { "categories": "luggage"} }
]
}
},
"aggs": {
"categories": {
"terms": {
"field": "categories.facet",
"exclude": ".*>{1}.*>{1}.*"
}
}
}
}
}
}

请注意,我添加了带有 正则表达式exclude,这样它就不会考虑任何不止一次出现 的方面>

如果有帮助,请告诉我。

关于具有分层类别、子类别的 Elasticsearch 聚合;限制水平,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52940790/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com