gpt4 book ai didi

RHadoop:所需的 REDUCE 能力超过集群中支持的最大容器能力

转载 作者:行者123 更新时间:2023-12-04 10:26:39 25 4
gpt4 key购买 nike

在沙箱 Hadoop (Cloudera5.1/Hortonworks2.1) 之上的 R (build 1060) 中是否有类似的问题?
这似乎是新 R/Hadoop 的问题,因为在 CDH5.0 上它可以工作。

代码:

Sys.setenv(HADOOP_CMD="/usr/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/usr/lib/hadoop-mapreduce/hadoop-streaming.jar")
Sys.setenv(JAVA_HOME="/usr/java/jdk1.7.0_55-cloudera")
library(rhdfs)
library(rmr2)
hdfs.init()

## space and word delimiter
map <- function(k,lines) {
words.list <- strsplit(lines, '\\s')
words <- unlist(words.list)
return( keyval(words, 1) )
}
reduce <- function(word, counts) {
keyval(word, sum(counts))
}
wordcount <- function (input, output=NULL) {
mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce)
}

## variables
hdfs.root <- '/user/cloudera'
hdfs.data <- file.path(hdfs.root, 'scenario_1')
hdfs.out <- file.path(hdfs.root, 'out')

## run mapreduce job
##out <- wordcount(hdfs.data, hdfs.out)
system.time(out <- wordcount(hdfs.data, hdfs.out))

错误:
> system.time(out <- wordcount(hdfs.data, hdfs.out))
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.3.0-cdh5.1.0.jar] /tmp/streamjob8497498354509963133.jar tmpDir=null
14/09/17 01:49:38 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
14/09/17 01:49:38 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
14/09/17 01:49:39 INFO mapred.FileInputFormat: Total input paths to process : 1
14/09/17 01:49:39 INFO mapreduce.JobSubmitter: number of splits:2
14/09/17 01:49:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1410940439997_0001
14/09/17 01:49:40 INFO impl.YarnClientImpl: Submitted application application_1410940439997_0001
14/09/17 01:49:40 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1410940439997_0001/
14/09/17 01:49:40 INFO mapreduce.Job: Running job: job_1410940439997_0001
14/09/17 01:49:54 INFO mapreduce.Job: Job job_1410940439997_0001 running in uber mode : false
14/09/17 01:49:54 INFO mapreduce.Job: map 100% reduce 100%
14/09/17 01:49:55 INFO mapreduce.Job: Job job_1410940439997_0001 failed with state KILLED due to: MAP capability required is more than the supported max container capability in the cluster. Killing the Job. mapResourceReqt: 4096 maxContainerCapability:1024
Job received Kill while in RUNNING state.
REDUCE capability required is more than the supported max container capability in the cluster. Killing the Job. **reduceResourceReqt: 4096 maxContainerCapability:1024**

14/09/17 01:49:55 INFO mapreduce.Job: Counters: 2
Job Counters
Total time spent by all maps in occupied slots (ms)=0
Total time spent by all reduces in occupied slots (ms)=0
14/09/17 01:49:55 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, : hadoop streaming failed with error code 1
Timing stopped at: 3.681 0.695 20.43

似乎问题出在 reduceResourceReqt: 4096 maxContainerCapability:1024 中。
我曾尝试更改:yarn-site.xml,但没有帮助。 :(

请帮忙...

最佳答案

我没有使用过 RHadoop。但是,我的集群遇到了一个非常相似的问题,这个问题似乎只与 MapReduce 相关。

maxContainerCapability 在此日志中指的是 yarn.scheduler.maximum-allocation-mb您的属性(property)yarn-site.xml配置。它是可以在任何容器中使用的最大内存量。

mapResourceReqt 减少资源请求 在您的日志中引用 mapreduce.map.memory.mbmapreduce.reduce.memory.mb您的 mapred-site.xml 的属性配置。它是将在 mapreduce 中为 Mapper 或 Reducer 创建的容器的内存大小。

如果您的 Reducer 容器的大小设置为大于 yarn.scheduler.maximum-allocation-mb ,这里似乎就是这种情况,您的工作将被终止,因为不允许为容器分配这么多内存。

http://[your-resource-manager]:8088/conf 检查您的配置并且您通常应该找到这些值并看到情况确实如此。

也许您的新环境将这些值设置为 4096 Mb(这相当大,Hadoop 2.7.1 中的默认值为 1024)。

解决方案

您应该降低 mapreduce.[map|reduce].memory.mb值降低到 1024,或者如果您有大量内存并想要大容器,请提高 yarn.scheduler.maximum-allocation-mb value 为 4096。只有这样 MapReduce 才能创建容器。

我希望这有帮助。

关于RHadoop:所需的 REDUCE 能力超过集群中支持的最大容器能力,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25878458/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com