gpt4 book ai didi

hadoop - 使用hadoop的mapreduce中溢出的记录是否应该始终等于mapinput记录或mapoutput记录?

转载 作者:可可西里 更新时间:2023-11-01 15:01:43 26 4
gpt4 key购买 nike

我正在研究在 hadoop 中使用 mapreduce 的矩阵乘法示例。我想问一下,溢出记录是否应该始终等于 mapinput 和 mapoutput 记录。我有不同于 mapinput 和 mapoutput 记录的溢出记录

这是我得到的其中一项测试的输出:

Three by three test
IB = 1
KB = 2
JB = 1
11/12/14 13:16:22 INFO input.FileInputFormat: Total input paths to process : 2
11/12/14 13:16:22 INFO mapred.JobClient: Running job: job_201112141153_0003
11/12/14 13:16:23 INFO mapred.JobClient: map 0% reduce 0%
11/12/14 13:16:32 INFO mapred.JobClient: map 100% reduce 0%
11/12/14 13:16:44 INFO mapred.JobClient: map 100% reduce 100%
11/12/14 13:16:46 INFO mapred.JobClient: Job complete: job_201112141153_0003
11/12/14 13:16:46 INFO mapred.JobClient: Counters: 17
11/12/14 13:16:46 INFO mapred.JobClient: Job Counters
11/12/14 13:16:46 INFO mapred.JobClient: Launched reduce tasks=1
11/12/14 13:16:46 INFO mapred.JobClient: Launched map tasks=2
11/12/14 13:16:46 INFO mapred.JobClient: Data-local map tasks=2
11/12/14 13:16:46 INFO mapred.JobClient: FileSystemCounters
11/12/14 13:16:46 INFO mapred.JobClient: FILE_BYTES_READ=1464
11/12/14 13:16:46 INFO mapred.JobClient: HDFS_BYTES_READ=528
11/12/14 13:16:46 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2998
11/12/14 13:16:46 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=384
11/12/14 13:16:46 INFO mapred.JobClient: Map-Reduce Framework
11/12/14 13:16:46 INFO mapred.JobClient: Reduce input groups=36
11/12/14 13:16:46 INFO mapred.JobClient: Combine output records=0
11/12/14 13:16:46 INFO mapred.JobClient: Map input records=18
11/12/14 13:16:46 INFO mapred.JobClient: Reduce shuffle bytes=735
11/12/14 13:16:46 INFO mapred.JobClient: Reduce output records=15
11/12/14 13:16:46 INFO mapred.JobClient: Spilled Records=108
11/12/14 13:16:46 INFO mapred.JobClient: Map output bytes=1350
11/12/14 13:16:46 INFO mapred.JobClient: Combine input records=0
11/12/14 13:16:46 INFO mapred.JobClient: Map output records=54
11/12/14 13:16:46 INFO mapred.JobClient: Reduce input records=54
11/12/14 13:16:46 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/12/14 13:16:46 INFO input.FileInputFormat: Total input paths to process : 1
11/12/14 13:16:46 INFO mapred.JobClient: Running job: job_local_0001
11/12/14 13:16:46 INFO input.FileInputFormat: Total input paths to process : 1
11/12/14 13:16:46 INFO mapred.MapTask: io.sort.mb = 100
11/12/14 13:16:46 INFO mapred.MapTask: data buffer = 79691776/99614720
11/12/14 13:16:46 INFO mapred.MapTask: record buffer = 262144/327680
11/12/14 13:16:46 INFO mapred.MapTask: Starting flush of map output
11/12/14 13:16:46 INFO mapred.MapTask: Finished spill 0
11/12/14 13:16:46 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.Merger: Merging 1 sorted segments
11/12/14 13:16:46 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 128 bytes
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/12/14 13:16:46 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:9000/tmp/MatrixMultiply/out
11/12/14 13:16:46 INFO mapred.LocalJobRunner: reduce > reduce
11/12/14 13:16:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
11/12/14 13:16:47 INFO mapred.JobClient: map 100% reduce 100%
11/12/14 13:16:47 INFO mapred.JobClient: Job complete: job_local_0001
11/12/14 13:16:47 INFO mapred.JobClient: Counters: 14
11/12/14 13:16:47 INFO mapred.JobClient: FileSystemCounters
11/12/14 13:16:47 INFO mapred.JobClient: FILE_BYTES_READ=89412
11/12/14 13:16:47 INFO mapred.JobClient: HDFS_BYTES_READ=37206
11/12/14 13:16:47 INFO mapred.JobClient: FILE_BYTES_WRITTEN=37390
11/12/14 13:16:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=164756
11/12/14 13:16:47 INFO mapred.JobClient: Map-Reduce Framework
11/12/14 13:16:47 INFO mapred.JobClient: Reduce input groups=9
11/12/14 13:16:47 INFO mapred.JobClient: Combine output records=9
11/12/14 13:16:47 INFO mapred.JobClient: Map input records=15
11/12/14 13:16:47 INFO mapred.JobClient: Reduce shuffle bytes=0
11/12/14 13:16:47 INFO mapred.JobClient: Reduce output records=9
11/12/14 13:16:47 INFO mapred.JobClient: Spilled Records=18
11/12/14 13:16:47 INFO mapred.JobClient: Map output bytes=180
11/12/14 13:16:47 INFO mapred.JobClient: Combine input records=15
11/12/14 13:16:47 INFO mapred.JobClient: Map output records=15
11/12/14 13:16:47 INFO mapred.JobClient: Reduce input records=9
...........X[0][0]=30, Y[0][0]=9
Bad Answer
...........X[0][1]=36, Y[0][1]=36
...........X[0][2]=42, Y[0][2]=42
...........X[1][0]=66, Y[1][0]=24
Bad Answer
...........X[1][1]=81, Y[1][1]=81
...........X[1][2]=96, Y[1][2]=96
...........X[2][0]=102, Y[2][0]=39
Bad Answer
...........X[2][1]=126, Y[2][1]=126
...........X[2][2]=150, Y[2][2]=150

这里描述了这个例子和代码:

http://www.norstad.org/matrix-multiply/index.html

你能告诉我问题出在哪里吗?我该如何解决?谢谢

工作人员

最佳答案

根据 Hadoop:权威指南,“溢出记录”计算作业过程中溢出到磁盘的记录总数,包括 map 和 reduce 端溢出。 “溢出记录”计数可能为零,这完全没问题。通常,溢出的记录意味着您已经超出了映射输出缓冲区中可用的内存量。拥有少量“溢出记录”通常不是问题。可用 RAM 的设置是 mapred-site.xml 中的 io.sort.mbio.sort.spill.percent。如果性能是一个问题,您可能希望调整这些以尽量减少溢出的记录。介绍Optimizing MapReduce Job Performance有更多详细信息,特别是幻灯片 #12 和 #13。如果溢出不止一次,那么由于需要合并溢出,您需要支付 3 倍的 IO 罚款。如果“溢出记录”大于“映射输出记录”+“减少输出记录”,那么您正在做不止一次溢出。请注意,RAM 的数量最终受限于 Java VM 的堆大小,因此您可能需要增加集群大小或通过增加给定作业的输入拆分来增加 map 任务的数量,以减少溢出次数。

在您的具体示例中,“Spilled Records”更大,因此您不止一次溢出。

关于hadoop - 使用hadoop的mapreduce中溢出的记录是否应该始终等于mapinput记录或mapoutput记录?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8504611/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com