gpt4 book ai didi

hadoop - 运行 Pig 脚本时出现堆空间问题

转载 作者:可可西里 更新时间:2023-11-01 14:37:42 25 4
gpt4 key购买 nike

我正在尝试执行一个包含大约 3000 万数据的 pig 脚本,但出现以下堆空间错误:

> ERROR 2998: Unhandled internal error. Java heap space
>
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2367)
> at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
> at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
> at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
> at java.lang.StringBuilder.append(StringBuilder.java:132)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.shiftStringByTabs(LogicalPlanPrinter.java:223)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:108)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirst(LogicalPlanPrinter.java:102)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.depthFirstLP(LogicalPlanPrinter.java:83)
> at org.apache.pig.newplan.logical.optimizer.LogicalPlanPrinter.visit(LogicalPlanPrinter.java:69)
> at org.apache.pig.newplan.logical.relational.LogicalPlan.getLogicalPlanString(LogicalPlan.java:148)
> at org.apache.pig.newplan.logical.relational.LogicalPlan.getSignature(LogicalPlan.java:133)
> at org.apache.pig.PigServer.execute(PigServer.java:1295)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:375)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:353)
> at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
> at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202)
> at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> at org.apache.pig.Main.run(Main.java:607)
> at org.apache.pig.Main.main(Main.java:156)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> ================================================================================

我用 1000 万数据运行相同的代码,运行良好。

So what are the possible ways I can avoid the above issue?
Does compression helps in avoiding the heap space issue?
I have tried to split the code into multiple fragments and still I am getting the error.So even though if we increase the heap memeory alloaction does it gurantee it will hold true if we execute the same with volume of data?

最佳答案

您可以通过将 mapred.map.tasks 设置为您想要的任何数量来增加映射器的数量。然后运行您的脚本。

关于hadoop - 运行 Pig 脚本时出现堆空间问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31065622/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com