gpt4 book ai didi

azure - 需要多少个 Hive 动态分区?

转载 作者:可可西里 更新时间:2023-11-01 15:31:02 27 4
gpt4 key购买 nike

我正在运行一项大型工作,将两年内不规则时间的约 55 个样本流(标签)(每条记录一个样本)合并为 15 分钟平均值。原始数据集中的 23k 个流中约有 11 亿条记录,而这 55 个流约占其中的 3300 万条记录。我计算了 15 分钟的索引,并按该索引进行分组以获得平均值,但是尽管将其提高到 20k,但我似乎超出了 Hive 作业的最大动态分区。我想我可以进一步增加它,但它已经需要一段时间才能失败(大约 6 小时,尽管我通过减少要考虑的流数量将其减少到 2),而且我实际上不知道如何计算我真正有多少需要。

这是代码:

SET hive.exec.dynamic.partition = true; 
SET hive.exec.dynamic.partition.mode = nonstrict;
SET hive.exec.max.dynamic.partitions=50000;
SET hive.exec.max.dynamic.partitions.pernode=20000;


DROP TABLE IF EXISTS sensor_part_qhr;

CREATE TABLE sensor_part_qhr (
tag STRING,
tag0 STRING,
tag1 STRING,
tagn_1 STRING,
tagn STRING,

timestamp STRING,
unixtime INT,
qqFr2013 INT,

quality INT,
count INT,
stdev double,
value double
)
PARTITIONED BY (bld STRING);

INSERT INTO TABLE sensor_part_qhr
PARTITION (bld)
SELECT tag,
min(tag),
min(tag0),
min(tag1),
min(tagn_1),
min(tagn),

min(timestamp),
min(unixtime),
qqFr2013,

min(quality),
count(value),
stddev_samp(value),
avg(value)
FROM sensor_part_subset
WHERE tag1='Energy'
GROUP BY tag,qqFr2013;

这是错误消息:

    Error during job, obtaining debugging information...
Examining task ID: task_1442824943639_0044_m_000008 (and more) from job job_1442824943639_0044
Examining task ID: task_1442824943639_0044_r_000000 (and more) from job job_1442824943639_0044

Task with the most failures(4):
-----
Task ID:
task_1442824943639_0044_r_000000

URL:
http://headnodehost:9014/taskdetails.jsp?jobid=job_1442824943639_0044&tipid=task_1442824943639_0044_r_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to: 20000
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException:

[Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions.
The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode.
Maximum was set to: 20000

at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:747)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829)
at org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:498)
at org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:521)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:232)
... 7 more

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 137
Container exited with a non-zero exit code 137


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 520 Reduce: 140 Cumulative CPU: 7409.394 sec HDFS Read: 0 HDFS Write: 393345977 SUCCESS
Job 1: Map: 9 Reduce: 1 Cumulative CPU: 87.201 sec HDFS Read: 393359417 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 days 2 hours 4 minutes 56 seconds 595 msec

任何人都可以提供一些关于如何计算这样的工作可能需要多少个动态节点的想法吗?

或者也许我应该采取不同的做法?顺便说一句,我正在 Azure HDInsight 上运行 Hive 0.13。

更新:

  • 更正了上面的一些数字。
  • 将其减少为 3 个流,对 211k 条记录进行操作,最终成功了。
  • 开始试验,将每个节点的分区减少到 5k,然后是 1k,仍然成功。

所以我不再被阻止,但我想我需要数百万个节点才能一次性完成整个数据集(这正是我真正想做的)。

最佳答案

动态分区列必须是 specified last在插入sensor_part_qhr期间SELECT语句中的列之间。

关于azure - 需要多少个 Hive 动态分区?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32860059/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com