gpt4 book ai didi

elasticsearch - 如何按嵌套类型的数组大小过滤?

转载 作者:行者123 更新时间:2023-12-02 23:12:35 25 4
gpt4 key购买 nike

假设我有以下类型:

{
"2019-11-04": {
"mappings": {
"_doc": {
"properties": {
"labels": {
"type": "nested",
"properties": {
"confidence": {
"type": "float"
},
"created_at": {
"type": "date",
"format": "strict_date_optional_time||date_time||epoch_millis"
},
"label": {
"type": "keyword"
},
"updated_at": {
"type": "date",
"format": "strict_date_optional_time||date_time||epoch_millis"
},
"value": {
"type": "keyword",
"fields": {
"numeric": {
"type": "float",
"ignore_malformed": true
}
}
}
}
},
"params": {
"type": "object"
},
"type": {
"type": "keyword"
}
}
}
}
}
}

我想按 labels数组的大小/长度进行过滤。我尝试了以下( as the official docs suggest):
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": "doc['labels'].size > 10"
}
}
}
}
}
}

但我不断得到:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:81)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:39)",
"doc['labels'].size > 10",
" ^---- HERE"
],
"script": "doc['labels'].size > 10",
"lang": "painless"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "2019-11-04",
"node": "kk5MNRPoR4SYeQpLk2By3A",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:81)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:39)",
"doc['labels'].size > 10",
" ^---- HERE"
],
"script": "doc['labels'].size > 10",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [labels] in mapping with types []"
}
}
}
]
},
"status": 500
}

最佳答案

恐怕这是不可能的,因为labels字段不是ES保存的字段,也不是albiet在其上创建反向索引的字段。

Doc doc['fieldname']仅适用于创建反向索引的字段,Elasticsearch的Query DSL也仅适用于创建反向索引的字段,不幸的是nested类型不是创建反向索引的有效字段。

话虽如此,我有以下两种方法可以做到这一点。

为了简单起见,我创建了示例映射,文档和两个可能的解决方案,它们可能会对您有所帮助。

对应:

PUT my_sample_index
{
"mappings": {
"properties": {
"myfield": {
"type": "nested",
"properties": {
"label": {
"type": "keyword"
}
}
}
}
}
}

样本文件:
// single field inside 'myfield'
POST my_sample_index/_doc/1
{
"myfield": {
"label": ["New York", "LA", "Austin"]
}
}


// two fields inside 'myfield'
POST my_sample_index/_doc/2
{
"myfield": {
"label": ["London", "Leicester", "Newcastle", "Liverpool"],
"country": "England"
}
}

解决方案1:使用 Script Fields(在应用程序级别进行管理)

我有一个变通办法来获取您想要的东西,虽然不完全正确,但是可以帮助您在服务层或应用程序上进行过滤。
POST my_sample_index/_search
{
"_source": "*",
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"script_fields": {
"label_size": {
"script": {
"lang": "painless",
"source": "params['_source']['labels'].size() > 1"
}
}
}
}

您会注意到,作为响应,将使用 label_sizetrue值创建一个单独的字段 false

样本响应如下所示:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_sample_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"myfield" : {
"label" : [
"New York",
"LA",
"Austin"
]
}
},
"fields" : {
"label_size" : [ <---- Scripted Field
false
]
}
},
{
"_index" : "my_sample_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"myfield" : {
"country" : "England",
"label" : [
"London",
"Leicester",
"Newcastle",
"Liverpool"
]
}
},
"fields" : { <---- Scripted Field
"label_size" : [
true <---- True because it has two fields 'labels' and 'country'
]
}
}
]
}
}

请注意,只有第二个文档才有意义,因为它具有两个字段 countrylabels。但是,如果只希望将 label_sizetrue一起使用,则必须在应用程序层进行管理。

解决方案2:使用 Reindexing的带有labels.size的 Script Processor

如下创建新索引:
PUT my_sample_index_temp
{
"mappings": {
"properties": {
"myfield": {
"type": "nested",
"properties": {
"label": {
"type": "keyword"
}
}
},
"labels_size":{ <---- New Field where we'd store the size
"type": "integer"
}
}
}
}

创建以下管道:
PUT _ingest/pipeline/set_labels_size
{
"description": "sets the value of labels size",
"processors": [
{
"script": {
"source": """
ctx.labels_size = ctx.myfield.size();
"""
}
}
]
}

使用Reindex API从my_sample_index索引重新索引
POST _reindex
{
"source": {
"index": "my_sample_index"
},
"dest": {
"index": "my_sample_index_temp",
"pipeline": "set_labels_size"
}
}

使用my_sample_index_temp验证GET my_sample_index_temp/_search中的文档
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_sample_index_temp",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"labels_size" : 1, <---- New Field Created
"myfield" : {
"label" : [
"New York",
"LA",
"Austin"
]
}
}
},
{
"_index" : "my_sample_index_temp",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"labels_size" : 2, <----- New Field Created
"myfield" : {
"country" : "England",
"label" : [
"London",
"Leicester",
"Newcastle",
"Liverpool"
]
}
}
}
]
}
}

现在,您只需在查询中使用此字段 labels_size即可,它的方式更简单,更不用说高效了。

希望这可以帮助!

关于elasticsearch - 如何按嵌套类型的数组大小过滤?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58697508/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com