gpt4 book ai didi

Elasticsearch 5.2.2 : terms aggregation case insensitive

转载 作者:行者123 更新时间:2023-11-29 02:52:23 29 4
gpt4 key购买 nike

我正在尝试对关键字类型字段进行不区分大小写的聚合,但我在使其工作时遇到了问题。

到目前为止,我尝试的是添加一个名为“lowercase”的自定义分析器,它使用“关键字”分词器和“lowercase”过滤器。然后,我为我要使用的字段添加了一个名为“use_lowercase”的映射字段。我还想保留现有的“文本”和“关键字”字段组件,因为我可能想在该字段中搜索术语。

这是索引定义,包括自定义分析器:

PUT authors
{
"settings": {
"analysis": {
"analyzer": {
"lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"famousbooks": {
"properties": {
"Author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"use_lowercase": {
"type": "text",
"analyzer": "lowercase"
}
}
}
}
}
}
}

现在我添加了 2 条作者相同但大小写不同的记录:

POST authors/famousbooks/1
{
"Book": "The Mysterious Affair at Styles",
"Year": 1920,
"Price": 5.92,
"Genre": "Crime Novel",
"Author": "Agatha Christie"
}

POST authors/famousbooks/2
{
"Book": "And Then There Were None",
"Year": 1939,
"Price": 6.99,
"Genre": "Mystery Novel",
"Author": "Agatha christie"
}

到目前为止一切顺利。现在,如果我根据作者进行术语聚合,

GET authors/famousbooks/_search
{
"size": 0,
"aggs": {
"authors-aggs": {
"terms": {
"field": "Author.use_lowercase"
}
}
}
}

我得到以下结果:

{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "authors",
"node": "yxcoq_eKRL2r6JGDkshjxg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [Author.use_lowercase] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
}
},
"status": 400
}

所以在我看来,聚合认为搜索字段是文本而不是关键字,因此给我 fielddata 警告。我认为 ES 会足够复杂,可以识别 terms 字段实际上是一个关键字(通过自定义分析器),因此可以进行聚合,但情况似乎并非如此。

如果我将 "fielddata":true 添加到 Author 的映射中,则聚合可以正常工作,但考虑到设置此值时堆使用率过高的可怕警告,我犹豫是否要这样做.

是否有执行此类不敏感关键字聚合的最佳做法?我希望我可以在 ma​​ppings 部分只说 "type":"keyword", "filter":"lowercase" 但这似乎不可用。

如果我走 "fielddata":true 路线,感觉就像我不得不使用太大的棍子才能让它工作。如有任何帮助,我们将不胜感激!

最佳答案

原来解决方案是使用自定义规范器而不是自定义分析器。

PUT authors
{
"settings": {
"analysis": {
"normalizer": {
"myLowercase": {
"type": "custom",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
"famousbooks": {
"properties": {
"Author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
},
"use_lowercase": {
"type": "keyword",
"normalizer": "myLowercase",
"ignore_above": 256
}
}
}
}
}
}
}

这样就可以毫无问题地使用字段 Author.use_lowercase 进行术语聚合。

关于 Elasticsearch 5.2.2 : terms aggregation case insensitive,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42517001/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com