gpt4 book ai didi

performance - hadoop大文件不拆分

转载 作者:可可西里 更新时间:2023-11-01 16:31:13 28 4
gpt4 key购买 nike

我有一个大小为 136MB 的输入文件,我启动了一些 WordCount 测试,我只监控一个映射器。然后我在我的 hdfs-site.xml 中将 dfs.blocksize 设置为 64MB 并且我仍然得到一个映射器。我做错了吗?

最佳答案

dfs.block.size is not alone playing a role and it's recommended not to change because it applies globally to HDFS.

Split size in mapreduce is calculated by this formula

max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size))

So you can set these properties in driver class as

conf.setLong("mapred.max.split.size", maxSplitSize); 
conf.setLong("mapred.min.split.size", minSplitSize);

Or in Config file as

<property>
<name>mapred.max.split.size</name>
<value>134217728</value>
</property>
<property>
<name>mapred.min.split.size</name>
<value>134217728</value>
</property>

关于performance - hadoop大文件不拆分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30965244/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com