gpt4 book ai didi

hadoop - 映射和 reduce task 计数在日志文件中不正确

转载 作者:行者123 更新时间:2023-12-02 21:49:56 25 4
gpt4 key购买 nike

我正在运行一个mapreduce作业,该作业可以正常运行。但是我对正在生成的日志文件有些困惑。

运行map-red的命令

hadoop jar mapred-0.0.1-SNAPSHOT.jar tcs.hadoop.org.mapreduce.MaxTemperatureDriver /priya/sample.txt /output

14/02/20 17:35:10 INFO input.FileInputFormat: Total input paths to process : 1
14/02/20 17:35:10 WARN snappy.LoadSnappy: Snappy native library is available
14/02/20 17:35:10 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/02/20 17:35:10 INFO snappy.LoadSnappy: Snappy native library loaded
14/02/20 17:35:10 INFO mapred.JobClient: Running job: job_201402111203_0034
14/02/20 17:35:11 INFO mapred.JobClient: map 0% reduce 0%
14/02/20 17:35:22 INFO mapred.JobClient: map 100% reduce 0%
14/02/20 17:35:36 INFO mapred.JobClient: map 100% reduce 100%
14/02/20 17:35:39 INFO mapred.JobClient: Job complete: job_201402111203_0034
14/02/20 17:35:40 INFO mapred.JobClient: Counters: 26
14/02/20 17:35:40 INFO mapred.JobClient: Job Counters
14/02/20 17:35:40 INFO mapred.JobClient: Launched reduce tasks=2
14/02/20 17:35:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=11900
14/02/20 17:35:40 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/20 17:35:40 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/02/20 17:35:40 INFO mapred.JobClient: Launched map tasks=1
14/02/20 17:35:40 INFO mapred.JobClient: Data-local map tasks=1
14/02/20 17:35:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=23142
14/02/20 17:35:40 INFO mapred.JobClient: FileSystemCounters
14/02/20 17:35:40 INFO mapred.JobClient: FILE_BYTES_READ=34
14/02/20 17:35:40 INFO mapred.JobClient: HDFS_BYTES_READ=633
14/02/20 17:35:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=154973
14/02/20 17:35:40 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=17
14/02/20 17:35:40 INFO mapred.JobClient: Map-Reduce Framework
14/02/20 17:35:40 INFO mapred.JobClient: Map input records=5
14/02/20 17:35:40 INFO mapred.JobClient: Reduce shuffle bytes=34
14/02/20 17:35:40 INFO mapred.JobClient: Spilled Records=4
14/02/20 17:35:40 INFO mapred.JobClient: Map output bytes=45
14/02/20 17:35:40 INFO mapred.JobClient: CPU time spent (ms)=4420
14/02/20 17:35:40 INFO mapred.JobClient: Total committed heap usage (bytes)=172822528
14/02/20 17:35:40 INFO mapred.JobClient: Combine input records=5
14/02/20 17:35:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=103
14/02/20 17:35:40 INFO mapred.JobClient: Reduce input records=2
14/02/20 17:35:40 INFO mapred.JobClient: Reduce input groups=2
14/02/20 17:35:40 INFO mapred.JobClient: Combine output records=2
14/02/20 17:35:40 INFO mapred.JobClient: Physical memory (bytes) snapshot=300945408
14/02/20 14/02/20 17:35:40 INFO mapred.JobClient: Virtual memory (bytes) snapshot=7375564800
14/02/20 17:35:40 INFO mapred.JobClient: Map output records=517:35:40 INFO mapred.JobClient: Reduce output records=2

因此,从中我可以看到我正在创建一个map任务和两个reduce任务。

但是,当我查看位于$ HADOOP_HOME / logs / history目录中的作业历史日志时,作业跟踪器触发了5个任务,如下所示(仅提供日志行)。 我无法理解为什么要执行5个任务而不是3个任务。
MapAttempt TASK_TYPE="SETUP" TASKID="task_201402111203_0034_m_000002" TASK_ATTEMPT_ID="attempt_201402111203_0034_m_000002_0" START_TIME="1392897911096" TRACKER_NAME="tracker_IMBDBOX1:IMBDBOX
1/157\.227\.44\.207:40925" HTTP_PORT="50060" .

MapAttempt TASK_TYPE="MAP" TASKID="task_201402111203_0034_m_000000" TASK_ATTEMPT_ID="attempt_201402111203_0034_m_000000_0" TASK_STATUS="SUCCESS" FINISH_TIME="1392897989806" HOSTNAME="/defaul

ReduceAttempt TASK_TYPE="REDUCE" TASKID="task_201402111203_0034_r_000001" TASK_ATTEMPT_ID="attempt_201402111203_0034_r_000001_0" START_TIME="1392897947754" TRACKER_NAME="tracker_IMBDBOX3:loc
alhost/127\.0\.0\.1:34625" HTTP_PORT="50060"

ReduceAttempt TASK_TYPE="REDUCE" TASKID="task_201402111203_0034_r_000000" TASK_ATTEMPT_ID="attempt_201402111203_0034_r_000000_0" START_TIME="1392897992388" TRACKER_NAME="tracker_IMBDBOX4:loc
alhost/127\.0\.0\.1:59439" HTTP_PORT="50060" .

MapAttempt TASK_TYPE="CLEANUP" TASKID="task_201402111203_0034_m_000001" TASK_ATTEMPT_ID="attempt_201402111203_0034_m_000001_0" START_TIME="1392898004324" TRACKER_NAME="tracker_IMBDBOX4:local
host/127\.0\.0\.1:59439" HTTP_PORT="50060"

同样,当我进入位于$ HADOOP_HOME / logs / userlogs中的userlog时,我只能看到一个映射任务已生成日志。 为什么未生成其他map和reduce任务日志?

请帮忙 。谢谢!

用户日志目录
total 8
-rw-r----- 1 hadoop hdusers 497 2014-02-20 17:35 job-acls.xml
lrwxrwxrwx 1 hadoop hdusers 96 2014-02-20 17:35 attempt_201402111203_0034_m_000002_0 -> /app/hadoop/tmp/mapred/local/userlogs/job_201402111203_0034/attempt_201402111203_0034_m_000002_0

最佳答案

请注意以下字符串:

TASK_TYPE="SETUP"
TASK_TYPE="CLEANUP"

Hadoop需要几个附件作业才能在 Map Reduce作业生命周期中运行。

关于hadoop - 映射和 reduce task 计数在日志文件中不正确,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21907922/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com