gpt4 book ai didi

hadoop - 配置单元 collect_set 崩溃查询

转载 作者:可可西里 更新时间:2023-11-01 15:16:12 25 4
gpt4 key购买 nike

我有下表:

hive> describe tv_counter_stats;
OK
day string
event string
query_id string
userid string
headers string

我想执行以下查询:

hive -e 'SELECT 
day,
event,
query_id,
COUNT(1) AS count,
COLLECT_SET(userid)
FROM
tv_counter_stats
GROUP BY
day,
event,
query_id;' > counter_stats_data.csv

但是,这个查询失败了。但以下查询工作正常:

hive -e 'SELECT 
day,
event,
query_id,
COUNT(1) AS count
FROM
tv_counter_stats
GROUP BY
day,
event,
query_id;' > counter_stats_data.csv

我在其中删除了 collect_set 命令。所以我的问题是:有人知道为什么 collect_set 在这种情况下可能会失败吗?

更新:添加了错误消息:

Diagnostic Messages for this Task:

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 3 Reduce: 1 Cumulative CPU: 10.49 sec HDFS Read: 109136387 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 10 seconds 490 msec

java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237)

Error: GC overhead limit exceeded
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237)

Error: GC overhead limit exceeded

更新 2:我更改了查询,现在它看起来像这样:

hive -e '
SET mapred.child.java.opts="-server -Xmx1g -XX:+UseConcMarkSweepGC";
SELECT
day,
event,
query_id,
COUNT(1) AS count,
COLLECT_SET(userid)
FROM
tv_counter_stats
GROUP BY
day,
event,
query_id;' > counter_stats_data.csv

但是,我得到以下错误:

Diagnostic Messages for this Task:
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237)


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 3 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

最佳答案

这可能是内存问题,因为 collect_set 会在内存中聚合数据。

尝试增加堆大小并启用并发 GC(通过将 Hadoop mapred.child.java.opts 设置为例如 -Xmx1g -XX:+UseConcMarkSweepGC)。

This answer有关于“GC 开销限制”错误的更多信息。

关于hadoop - 配置单元 collect_set 崩溃查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21097963/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com