gpt4 book ai didi

hadoop - 多个文件是否存储在一个 block 中?

转载 作者:可可西里 更新时间:2023-11-01 14:24:38 25 4
gpt4 key购买 nike

当我将许多小文件存储到 HDFS 时,它们会存储在一个 block 中吗?

在我看来,根据这个讨论,这些小文件应该存储在一个 block 中: HDFS block size Vs actual file size

最佳答案

引自 Hadoop - 权威指南:

HDFS stores small files inefficiently, since each file is stored in a block, and block metadata is held in memory by the namenode. Thus, a large number of small files can eat up a lot of memory on the namenode. (Note, however, that small files do not take up any more disk space than is required to store the raw contents of the file. For example, a 1 MB file stored with a block size of 128 MB uses 1 MB of disk space, not 128 MB.) Hadoop Archives, or HAR files, are a file archiving facility that packs files into HDFS blocks more efficiently, thereby reducing namenode memory usage while still allowing transparent access to files.

结论:每个文件都将存储在一个单独的 block 中。

关于hadoop - 多个文件是否存储在一个 block 中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21274334/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com