gpt4 book ai didi

hadoop - 当我在AWS上运行map()时某些任务失败

转载 作者:行者123 更新时间:2023-12-02 22:05:53 24 4
gpt4 key购买 nike

我在s3:// aws-publicdatasets / common-crawl / parse-output / segment / 1346876860819 / metadata-XXXX数据集上运行页面排名。当我在2个m1.medium中使用10个文件(大约1GB)时,该程序可以工作,但是当我在5个m3.xlarge实例中使用300个文件(20GB)时,它在映射39%时失败,减少4%。您能否找到失败的可能原因?

这是日志。

stderr:
AttemptID:attempt_1411372099942_0001_m_000010_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000014_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000015_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000057_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000103_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000094_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000109_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000108_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000133_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000136_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000010_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000151_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000014_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000168_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000167_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000015_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000174_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000175_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000057_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000181_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000182_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000190_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000103_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000109_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000094_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000200_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000108_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000133_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000199_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000136_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000010_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000151_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000206_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000207_0 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000014_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000168_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000175_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000167_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000174_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000015_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000057_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000181_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000182_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000190_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000103_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000094_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000200_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000109_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000108_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000133_2 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000199_1 Timed out after 600 secs
AttemptID:attempt_1411372099942_0001_m_000136_2 Timed out after 600 secs
part of syslog:
08:24:24,791 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000168_1, Status : FAILED
2014-09-22 08:24:46,873 INFO org.apache.hadoop.mapreduce.Job (main): map 39% reduce 4%
2014-09-22 08:24:54,903 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000175_1, Status : FAILED
2014-09-22 08:24:54,904 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000167_1, Status : FAILED
2014-09-22 08:24:54,904 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000174_1, Status : FAILED
2014-09-22 08:24:55,908 INFO org.apache.hadoop.mapreduce.Job (main): map 38% reduce 4%
2014-09-22 08:25:13,968 INFO org.apache.hadoop.mapreduce.Job (main): map 39% reduce 4%
2014-09-22 08:25:25,007 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000015_2, Status : FAILED
2014-09-22 08:26:24,210 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000057_2, Status : FAILED
2014-09-22 08:26:54,322 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000181_1, Status : FAILED
2014-09-22 08:27:24,432 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000182_1, Status : FAILED
2014-09-22 08:27:25,435 INFO org.apache.hadoop.mapreduce.Job (main): map 38% reduce 4%
2014-09-22 08:27:54,543 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000190_1, Status : FAILED
2014-09-22 08:28:54,751 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000103_2, Status : FAILED
2014-09-22 08:29:24,851 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000094_2, Status : FAILED
2014-09-22 08:29:24,852 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000200_1, Status : FAILED
2014-09-22 08:29:24,853 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000109_2, Status : FAILED
2014-09-22 08:29:48,931 INFO org.apache.hadoop.mapreduce.Job (main): map 39% reduce 4%
2014-09-22 08:29:54,954 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000108_2, Status : FAILED
2014-09-22 08:30:24,066 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000133_2, Status : FAILED
2014-09-22 08:32:54,599 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000199_1, Status : FAILED
2014-09-22 08:32:54,600 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1411372099942_0001_m_000136_2, Status : FAILED
2014-09-22 08:34:25,910 INFO org.apache.hadoop.mapreduce.Job (main): map 100% reduce 100%
2014-09-22 08:34:25,915 INFO org.apache.hadoop.mapreduce.Job (main): Job job_1411372099942_0001 failed with state FAILED due to: Task failed task_1411372099942_0001_m_000010
Job failed as tasks failed. failedMaps:1 failedReduces:0

Attempts for: s-1W7C8YIFC87Y8, Job 1411372099942_0001, Task

2014-09-22 08:18:27,238 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-09-22 08:18:27,322 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-09-22 08:18:28,462 INFO main org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-09-22 08:18:28,496 INFO main org.apache.hadoop.metrics2.sink.cloudwatch.CloudWatchSink: Initializing the CloudWatchSink for metrics.
2014-09-22 08:18:28,795 INFO main org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink file started
2014-09-22 08:18:28,967 INFO main org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 300 second(s).
2014-09-22 08:18:28,967 INFO main org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2014-09-22 08:18:28,982 INFO main org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2014-09-22 08:18:28,983 INFO main org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1411372099942_0001, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@3fc15856)
2014-09-22 08:18:29,157 INFO main org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2014-09-22 08:18:29,880 INFO main org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1411372099942_0001,/mnt1/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1411372099942_0001,/mnt2/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1411372099942_0001
2014-09-22 08:18:30,164 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-09-22 08:18:30,182 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-09-22 08:18:31,063 INFO main org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2014-09-22 08:18:32,100 INFO main org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2014-09-22 08:18:32,605 INFO main org.apache.hadoop.mapred.MapTask: Processing split: s3://aws-publicdatasets/common-crawl/parse-output/segment/1346876860819/metadata-00122:0+67108864
2014-09-22 08:18:32,810 INFO main amazon.emr.metrics.MetricsSaver: MetricsSaver YarnChild root:hdfs:///mnt/var/em/ period:120 instanceId:i-ec84e7c1 jobflow:j-27XODJ8WMW4VP
2014-09-22 08:18:33,205 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-09-22 08:18:33,219 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-09-22 08:18:33,221 INFO main com.amazon.ws.emr.hadoop.fs.guice.EmrFSBaseModule: Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as FileSystem implementation.
2014-09-22 08:18:35,024 INFO main com.amazon.ws.emr.hadoop.fs.EmrFileSystem: Using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
2014-09-22 08:18:36,001 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-09-22 08:18:36,002 WARN main org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-09-22 08:18:36,024 INFO main org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2014-09-22 08:18:36,514 INFO main org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 52428796(209715184)
2014-09-22 08:18:36,514 INFO main org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 200
2014-09-22 08:18:36,514 INFO main org.apache.hadoop.mapred.MapTask: soft limit at 167772160
2014-09-22 08:18:36,514 INFO main org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 209715200
2014-09-22 08:18:36,514 INFO main org.apache.hadoop.mapred.MapTask: kvstart = 52428796; length = 13107200
2014-09-22 08:18:36,597 INFO main com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem: Opening 's3://aws-publicdatasets/common-crawl/parse-output/segment/1346876860819/metadata-00122' for reading
2014-09-22 08:18:36,716 INFO main org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2014-09-22 08:18:36,720 INFO main org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor ht t p: //. gz
2014-09-22 08:18:36,726 INFO main org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
2014-09-22 08:18:36,726 INFO main org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor
2014-09-22 08:18:36,727 INFO main org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor

Edited by: paraxx on Sep 22, 2014 10:25 AM

最佳答案

task_1411372099942_0001_m_000010超时。尝试增加超时配置参数。

 mapreduce.task.timeout=12000000

关于hadoop - 当我在AWS上运行map()时某些任务失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25983963/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com