gpt4 book ai didi

java - Hadoop 上次 map 作业卡住 - 需要帮助

转载 作者:可可西里 更新时间:2023-11-01 14:47:52 26 4
gpt4 key购买 nike

我正在使用 hadoop map-reduce 作业进行一些文本处理。我的工作已完成 99.2%,并停留在上一个 map 工作上。

map 输出的最后几行如下所示。上次发生此问题时,我尝试打印出从 map 发出的键值,并注意到其中一个键具有大量与之关联的值,我认为它在对这些值进行排序时似乎卡住了。然后,我停止从 map 作业中发出该键,它工作正常。

我想,同样的问题又发生了,打印出键值对是一项乏味的工作,因为这项工作很费时间。有更好的选择吗?如果他们在排序上花费太多时间,就像配置 hadoop 忘记几个键一样。有没有这样的。

2010-10-20 14:43:32,274 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 14:43:32,274 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 79698262; bufvoid = 996147202010-10-20 14:43:32,274 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0; kvend = 6601; length = 3276802010-10-20 14:43:33,272 INFO org.apache.hadoop.mapred.MapTask: Finished spill 02010-10-20 14:50:44,113 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 14:50:44,113 INFO org.apache.hadoop.mapred.MapTask: bufstart = 79698262; bufend = 59800449; bufvoid = 996147202010-10-20 14:50:44,113 INFO org.apache.hadoop.mapred.MapTask: kvstart = 6601; kvend = 9039; length = 3276802010-10-20 14:50:44,864 INFO org.apache.hadoop.mapred.MapTask: Finished spill 12010-10-20 14:58:33,105 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 14:58:33,105 INFO org.apache.hadoop.mapred.MapTask: bufstart = 59800449; bufend = 39893455; bufvoid = 996147202010-10-20 14:58:33,105 INFO org.apache.hadoop.mapred.MapTask: kvstart = 9039; kvend = 11228; length = 3276802010-10-20 14:58:33,817 INFO org.apache.hadoop.mapred.MapTask: Finished spill 22010-10-20 15:06:48,675 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 15:06:48,675 INFO org.apache.hadoop.mapred.MapTask: bufstart = 39893455; bufend = 20000988; bufvoid = 996147202010-10-20 15:06:48,675 INFO org.apache.hadoop.mapred.MapTask: kvstart = 11228; kvend = 13286; length = 3276802010-10-20 15:06:49,395 INFO org.apache.hadoop.mapred.MapTask: Finished spill 32010-10-20 15:15:23,514 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 15:15:23,514 INFO org.apache.hadoop.mapred.MapTask: bufstart = 20000988; bufend = 78879; bufvoid = 996147202010-10-20 15:15:23,514 INFO org.apache.hadoop.mapred.MapTask: kvstart = 13286; kvend = 15265; length = 3276802010-10-20 15:15:24,230 INFO org.apache.hadoop.mapred.MapTask: Finished spill 42010-10-20 15:24:35,797 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 15:24:35,797 INFO org.apache.hadoop.mapred.MapTask: bufstart = 78879; bufend = 79807573; bufvoid = 996147202010-10-20 15:24:35,797 INFO org.apache.hadoop.mapred.MapTask: kvstart = 15265; kvend = 17188; length = 3276802010-10-20 15:24:36,500 INFO org.apache.hadoop.mapred.MapTask: Finished spill 52010-10-20 15:33:33,391 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 15:33:33,391 INFO org.apache.hadoop.mapred.MapTask: bufstart = 79807573; bufend = 59907680; bufvoid = 996147202010-10-20 15:33:33,391 INFO org.apache.hadoop.mapred.MapTask: kvstart = 17188; kvend = 19074; length = 3276802010-10-20 15:33:34,114 INFO org.apache.hadoop.mapred.MapTask: Finished spill 62010-10-20 15:42:39,913 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 15:42:39,913 INFO org.apache.hadoop.mapred.MapTask: bufstart = 59907680; bufend = 40011208; bufvoid = 996147202010-10-20 15:42:39,913 INFO org.apache.hadoop.mapred.MapTask: kvstart = 19074; kvend = 20926; length = 3276802010-10-20 15:42:40,597 INFO org.apache.hadoop.mapred.MapTask: Finished spill 72010-10-20 15:51:49,668 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 15:51:49,668 INFO org.apache.hadoop.mapred.MapTask: bufstart = 40011208; bufend = 20111383; bufvoid = 996147202010-10-20 15:51:49,668 INFO org.apache.hadoop.mapred.MapTask: kvstart = 20926; kvend = 22759; length = 3276802010-10-20 15:51:50,378 INFO org.apache.hadoop.mapred.MapTask: Finished spill 82010-10-20 16:01:05,893 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 16:01:05,893 INFO org.apache.hadoop.mapred.MapTask: bufstart = 20111383; bufend = 196929; bufvoid = 996147202010-10-20 16:01:05,894 INFO org.apache.hadoop.mapred.MapTask: kvstart = 22759; kvend = 24572; length = 3276802010-10-20 16:01:06,634 INFO org.apache.hadoop.mapred.MapTask: Finished spill 92010-10-20 16:10:25,000 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 16:10:25,000 INFO org.apache.hadoop.mapred.MapTask: bufstart = 196929; bufend = 79900267; bufvoid = 996147202010-10-20 16:10:25,000 INFO org.apache.hadoop.mapred.MapTask: kvstart = 24572; kvend = 26370; length = 3276802010-10-20 16:10:25,776 INFO org.apache.hadoop.mapred.MapTask: Finished spill 102010-10-20 16:19:48,283 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true2010-10-20 16:19:48,283 INFO org.apache.hadoop.mapred.MapTask: bufstart = 79900267; bufend = 59993676; bufvoid = 996147202010-10-20 16:19:48,284 INFO org.apache.hadoop.mapred.MapTask: kvstart = 26370; kvend = 28152; length = 3276802010-10-20 16:19:49,042 INFO org.apache.hadoop.mapred.MapTask: Finished spill 11

谢谢

最佳答案

Hadoop 中没有任何东西知道 map() 的特定调用正在发出过多的键值对。我猜你的 map() 函数中有某种循环发出这些键值对。如果它发射超过 N 对,您可以简单地将循环编码为短路。

另一种选择是想出一些方法来划分输入值,以便映射器处理更细粒度的 block ,这样所有映射器都在做大致相同的工作量。

我不确定你到底想做什么,所以这些建议可能不适用。希望这会有所帮助。

关于java - Hadoop 上次 map 作业卡住 - 需要帮助,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3981750/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com