gpt4 book ai didi

apache - Hadoop 2.6.4和大文件

转载 作者:行者123 更新时间:2023-12-02 21:24:08 25 4
gpt4 key购买 nike

我是Apache Hadoop的新用户。有一个我不明白的时刻。我有一个简单的群集(3个节点)。每个节点大约有30GB的可用空间。当我查看Hadoop的Overview网站时,我看到了DFS剩余部分:90.96 GB。我将复制因子设置为1。

然后,我创建一个文件50GB,然后尝试将其上传到HDFS。但是空间已经耗尽。为什么?我不能上传超过一个节点群集可用空间的文件吗?

最佳答案

根据Hadoop:权威指南

Hadoop’s default strategy is to place the first replica on the same node as the client (for clients running outside the cluster, a node is chosen at random, although the system tries not to pick nodes that are too full or too busy). The second replica is placed on a different rack from the first (off-rack), chosen at random. The third replica is placed on the same rack as the second, but on a different node chosen at random. Further replicas are placed on random nodes on the cluster, although the system tries to avoid placing too many replicas on the same rack. This logic makes sense as it decreases the network chatter between the different nodes.



我认为这取决于客户端是否与Hadoop节点相同。如果客户端是Hadoop节点,则所有拆分都将在同一节点上。尽管群集中有多个节点,但这并不能提供更好的读写吞吐量。如果客户端与Hadoop节点不同,则会为每个拆分随机选择该节点,因此拆分将分散在群集中的各个节点上。现在,这提供了更好的读/写吞吐量。

关于apache - Hadoop 2.6.4和大文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36567568/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com