gpt4 book ai didi

java - Lucene:IndexSearcher.search() 在非常大的数据库上导致 java 堆空间错误

转载 作者:行者123 更新时间:2023-12-02 07:57:25 25 4
gpt4 key购买 nike

我有一个非常大的数据库(大约 3000 万条记录,每条记录至少有 26 个字段),我已使用 Apache Lucene Java 为其建立了索引。

我正在从两个字段构造一个查询。每个搜索词都可以出现在九个字段中的任何一个中,如果两个搜索词都出现在文档中的任何相关字段中,我希望我的查询返回一个文档。查询的结构如下:

Private Query CreateQuery(String theSearchTerm, String theField) throws ParseException
{
StandardAnalyzer theAnalyzer = new StandardAnalyzer(Version.LUCENE_35);
Query q;
QueryParser qp = new QueryParser(Version.LUCENE_35, theField, theAnalyzer);
qp.setDefaultOperator(QueryParser.Operator.AND);
qp.setAllowLeadingWildcard = true;
q = qp.parse(theSearchTerm);
return q;
}

Public ScoreDoc[] RunTheQuery(String searchTerm1, String searchTerm2)
{
Directory theIndex = new SimpleFSDirectory(new File("C:\\MyDirectory");
IndexSearcher theSearcher = new IndexSearcher(InderReader.open(theIndex));

BooleanQuery theTopLevelBooleanQuery = new BooleanQuery();

BooleanQuery fields1 = new BooleanQuery();
BooleanQuery fields2 = new BooleanQuery();
BooleanQuery fields3 = new BooleanQuery();
BooleanQuery fields4 = new BooleanQuery();
BooleanQuery fields5 = new BooleanQuery();
BooleanQuery fields6 = new BooleanQuery();
BooleanQuery fields7 = new BooleanQuery();
BooleanQuery fields8 = new BooleanQuery();
BooleanQuery fields9 = new BooleanQuery();

BooleanQuery innerQuery = new BooleanQuery();

fields1.add(CreateQuery(searchTerm1, param1), BooleanClause.Occur.MUST);
fields1.add(CreateQuery(searchTerm2, param2), BooleanClause.Occur.MUST);
fields2.add(CreateQuery(searchTerm1, param3), BooleanClause.Occur.MUST);
fields2.add(CreateQuery(searchTerm2, param4), BooleanClause.Occur.MUST);
fields3.add(CreateQuery(searchTerm1, param5), BooleanClause.Occur.MUST);
fields3.add(CreateQuery(searchTerm2, param6), BooleanClause.Occur.MUST);
fields4.add(CreateQuery(searchTerm1, param7), BooleanClause.Occur.MUST);
fields4.add(CreateQuery(searchTerm2, param8), BooleanClause.Occur.MUST);
fields5.add(CreateQuery(searchTerm1, param9), BooleanClause.Occur.MUST);
fields5.add(CreateQuery(searchTerm2, param10), BooleanClause.Occur.MUST);
fields6.add(CreateQuery(searchTerm1, param11), BooleanClause.Occur.MUST);
fields6.add(CreateQuery(searchTerm2, param12), BooleanClause.Occur.MUST);
fields7.add(CreateQuery(searchTerm1, param13), BooleanClause.Occur.MUST);
fields7.add(CreateQuery(searchTerm2, param14), BooleanClause.Occur.MUST);
fields8.add(CreateQuery(searchTerm1, param15), BooleanClause.Occur.MUST);
fields8.add(CreateQuery(searchTerm2, param16), BooleanClause.Occur.MUST);
fields9.add(CreateQuery(searchTerm1, param17), BooleanClause.Occur.MUST);
fields9.add(CreateQuery(searchTerm2, param18), BooleanClause.Occur.MUST);

innerQuery.add(fields1, BooleanClause.Occur.SHOULD);
innerQuery.add(fields2, BooleanClause.Occur.SHOULD);
innerQuery.add(fields3, BooleanClause.Occur.SHOULD);
innerQuery.add(fields4, BooleanClause.Occur.SHOULD);
innerQuery.add(fields5, BooleanClause.Occur.SHOULD);
innerQuery.add(fields6, BooleanClause.Occur.SHOULD);
innerQuery.add(fields7, BooleanClause.Occur.SHOULD);
innerQuery.add(fields8, BooleanClause.Occur.SHOULD);
innerQuery.add(fields9, BooleanClause.Occur.SHOULD);

theTopLevelBooleanQuery.add(innerQuery, BooleanClause.Occur.MUST);

TopDocScoreCollector collector = TopDocScoreCollector.create(200, true);

//Heap space error occurs here
theSearcher.search(theTopLevelBooleanQuery, collector);

ScoreDoc[] hits = collector.topDocs().scoreDocs;
return hits;
}

我的问题是,当我调用 IndexSearcher.search() 方法时,服务器(Windows Server 2003 R2)上的 java.exe 进程消耗超过 540 MB,这会导致 java 堆空间错误。为了完整起见,java 应用程序在 Web 服务器上运行(当前是 Oracle Glassfish,尽管我希望迁移到 Apache Tomcat)。

有人知道如何阻止这个堆空间错误吗? StackOverflow 帖子 (http://stackoverflow.com/questions/7259736/cant-open-lucene-index-java-heap-space) 似乎解决了类似的问题,但并没有真正给出详细的答案。

增加 Java 进程可以使用的内存量是唯一的答案吗?唯一的答案是编写一个新的搜索器,在这种情况下,有人可以推荐一篇关于轻量级搜索器的好文章吗?

有没有办法通过修改上面的代码来解决这个问题?

如有任何帮助,我们将不胜感激,谢谢,瑞克

最佳答案

您可以像这样增加 java 堆空间:

java -Xmx6g myprogram

或者查看这篇文章: increase heap size in Java

或者: IBM SDK for Java

关于java - Lucene:IndexSearcher.search() 在非常大的数据库上导致 java 堆空间错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9428757/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com