gpt4 book ai didi

java - 无法存储长度超过 32766 的关键字文本字段

转载 作者:行者123 更新时间:2023-12-02 11:30:00 25 4
gpt4 key购买 nike

我一直在尝试将字段存储为类型关键字以支持区分大小写的文本搜索,

但是当我尝试存储长度超过 32766 个字符的文本时,它无法存储它,出现以下异常

    Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field="case_message_message.lowcase" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[-32, -80, -84, -32, -79, -122, -32, -80, -126, -32, -80, -105, -32, -80, -77, -32, -79, -126, -32, -80, -80, -32, -79, -127, 58, 32, -32, -80, -107, -32]...', original message: bytes can be at most 32766 in length; got 37632]

有什么办法可以将这段文字存储在32766以上吗,

Elasticsearch 版本6.1.2

非常感谢任何帮助。

更新1:

这是我使用自定义标准化器和标准化器的索引的映射

{
"org-16-database": {
"mappings": {
"org-16-table": {
"properties": {
"My field": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"lowcase": {
"type": "keyword",
"normalizer": "my_normalizer"
}
},
"fielddata": true
}
}
}
}
}
}

设置

    {
"org-16-database": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "org-16-database",
"creation_date": "1521198435444",
"analysis": {
"normalizer": {
"my_normalizer": {
"filter": [
"lowercase"
],
"type": "custom"
}
}
},
"number_of_replicas": "1",
"uuid": "lN-7iYloQWy7oaD3uMIYGQ",
"version": {
"created": "6010299"
}
}
}
}
}

最佳答案

documentation 中所写当您创建新的关键字字段时,默认情况下启用参数ignore_above。此选项对于防止 Lucene 的术语字节长度限制 32766 也很有用。您可以通过修改映射来增加此限制,而无需重新索引。允许的最大值为 10922

关于java - 无法存储长度超过 32766 的关键字文本字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49356524/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com