gpt4 book ai didi

solr - 将文档添加到 SOLR : Document contains at least one immense term 中的索引

转载 作者:行者123 更新时间:2023-12-04 00:38:06 25 4
gpt4 key购买 nike

我正在添加(通过 Java 程序)用于索引,SOLR 索引中的文档,但在 add(inputDoc) 之后方法有一个异常(exception)。登录 solr web 界面包含以下内容:

Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="text" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[99, 111, 112, 101, 114, 116, 105, 110, 97, 32, 105, 110, 102, 111, 114, 109, 97, 122, 105, 111, 110, 105, 32, 113, 117, 101, 115, 116, 111, 32]...', original message: bytes can be at most 32766 in length; got 226781
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:687)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:239)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:457)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)
... 40 more
Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 226781
at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:284)
at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:663)
... 47 more

请问我该怎么做才能解决这个问题?

最佳答案

我遇到了和你一样的问题,最后我解决了我的问题。请检查您的“文本”字段的类型,我怀疑它必须是“字符串”。

您可以在 中找到它托管架构 核心:

<field name="text" type="strings"/>

或者您可以转到 Solr Admin,访问: http://localhost:8983/solr/CORE_NAME/schema/fieldtypes?wt=json然后搜索“text”,如果它类似于以下内容,您就知道您将“text”字段定义为字符串类型:
  {
"name":"strings",
"class":"solr.StrField",
"multiValued":true,
"sortMissingLast":true,
"fields":["text"],
"dynamicFields":["*_ss"]},

然后我的解决方案适用于您,您可以在 中将类型从“strings”更改为“text_general”托管架构 . (确保 schema.xml 中的“文本”类型也是“text_general”)
   <field name="text" type="text_general">

这将解决您的问题。字符串是字符串字段,但 text_general 是文本字段。

关于solr - 将文档添加到 SOLR : Document contains at least one immense term 中的索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29445323/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com