gpt4 book ai didi

java - Nutch + Solr - 索引器导致 java.lang.OutOfMemoryError : Java heap space

转载 作者:行者123 更新时间:2023-11-28 23:34:38 26 4
gpt4 key购买 nike

我已将我的 2 个服务器配置为在分布式模式下运行(使用 Hadoop),我的爬行过程配置是 Nutch 2.2.1 - HBase(作为存储)和 Solr。 Solr 由 Tomcat 运行。问题是每次我尝试执行最后一步时——我的意思是当我想将 HBase 中的数据索引到 Solr 中时。之后发生此[1] 错误。我尝试像这样添加 CATALINA_OPTS(或 JAVA_OPTS):

CATALINA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC -Xms1g -Xmx6000m -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=30 -XX:MaxPermSize=512m -XX:+CMSClassUnloadingEnabled"

Tomcat 的 catalina.sh 脚本并使用此脚本运行服务器,但没有帮助。我还将这些 [2] 属性添加到 nutch-site.xml 文件,但它再次以 OutOfMemory 结束。你能帮我吗?

[1]

2014-09-06 22:52:50,683 FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space 
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:587)
at java.lang.StringBuffer.append(StringBuffer.java:332)
at java.io.StringWriter.write(StringWriter.java:77)
at org.apache.solr.common.util.XML.escape(XML.java:204)
at org.apache.solr.common.util.XML.escapeCharData(XML.java:77)
at org.apache.solr.common.util.XML.writeXML(XML.java:147)
at org.apache.solr.client.solrj.util.ClientUtils.writeVal(ClientUtils.java:161)
at org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java:129)
at org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateRequest.java:355)
at org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest.java:271)
at org.apache.solr.client.solrj.request.RequestWriter.getContentStream(RequestWriter.java:66)
at org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getDelegate(RequestWriter.java:94)
at org.apache.solr.client.solrj.request.RequestWriter$LazyContentStream.getName(RequestWriter.java:104)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:247)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:96)
at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:117)
at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:54)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)

[2]

<property>
<name>http.content.limit</name>
<value>150000000</value>
</property>

<property>
<name>indexer.max.tokens</name>
<value>100000</value>
</property>

<property>
<name>http.timeout</name>
<value>50000</value>
</property>

<property>
<name>solr.commit.size</name>
<value>100</value>
</property>

最佳答案

我已经通过下面的配置解决了这个问题(mapred-site.xml 文件):

<property>
<name>mapred.jobtracker.retirejob.interval</name>
<value>3600000</value>
</property>

<property>
<name>mapred.job.tracker.retiredjobs.cache.size</name>
<value>100</value>
</property>

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4000m -XX:+UseConcMarkSweepGC</value>
</property>

<property>
<name>mapred.child.ulimit</name>
<value>6000000</value>
</property>

<property>
<name>mapred.jobtracker.completeuserjobs.maximum</name>
<value>5</value>
</property>

<property>
<name>mapred.job.tracker.handler.count</name>
<value>5</value>
</property>

关于java - Nutch + Solr - 索引器导致 java.lang.OutOfMemoryError : Java heap space,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25708897/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com