gpt4 book ai didi

hadoop - Hive - 内存不足异常 - Java 堆空间

转载 作者:可可西里 更新时间:2023-11-01 14:37:20 27 4
gpt4 key购买 nike

我正在 parquet 文件(使用 Spark 创建)上运行一个 Hive 插入。 Hive 插入使用 partitioned by 子句。但是最后,当屏幕打印出类似“正在加载分区 {=xyz, =123, =abc} 的消息时,Java 堆空间异常即将到来。

java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.createEntry(HashMap.java:901)
at java.util.HashMap.addEntry(HashMap.java:888)
at java.util.HashMap.put(HashMap.java:509)
at org.apache.hadoop.hive.metastore.api.Partition.<init>(Partition.java:229)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy(HiveMetaStoreClient.java:1356)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionWithAuthInfo(HiveMetaStoreClient.java:1003)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy9.getPartitionWithAuthInfo(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1611)
at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1565)
at org.apache.hadoop.hive.ql.exec.StatsTask.getPartitionsList(StatsTask.java:403)
at org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:150)
at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:117)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:456)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:466)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)

我在运行作业时设置了以下属性,并尝试将值更改为更高和更低,但每次最后我都发现这个错误。

属性切换:

 set mapred.map.tasks=100;
set mapred.reduce.tasks=100;
set mapreduce.map.java.opts=-Xmx4096m;
set mapreduce.reduce.java.opts=-Xmx4096m;
set hive.exec.max.dynamic.partitions.pernode=100000;
set hive.exec.max.dynamic.partitions=100000;

请指出这里出了什么问题。 Hive 版本为 0.13。

hive 环境.sh

 if [ "$SERVICE" = "cli" ]; then
if [ -z "$DEBUG" ]; then
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms12288m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
else
export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms12288m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
fi
fi

# The heap size of the jvm stared by hive shell script can be controlled via:
#
export HADOOP_HEAPSIZE=4096

最佳答案

可能与HIVE-10149有关问题。尝试将 hive.optimize.sort.dynamic.partition 设置为 true

关于hadoop - Hive - 内存不足异常 - Java 堆空间,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33112897/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com