gpt4 book ai didi

binary - Hadoop可以读取任意 key 二进制文件吗

转载 作者:可可西里 更新时间:2023-11-01 14:39:30 26 4
gpt4 key购买 nike

看起来Hadoop MapReduce 需要文本或二进制文本中的键值对结构。实际上,我们可能会将文件拆分成 block 进行处理。但 key 可能是分布在整个文件中。一个键后跟一个值可能不是一个明确的界限。是否有任何 InputFileFormatter 可以读取此类二进制文件?我不想使用 Map Reduce 和 Map Reduce。这将降低性能并破坏使用 map reduce 的目的。有什么建议么?谢谢,

最佳答案

根据Hadoop : The Definitive Guide

The logical records that FileInputFormats define do not usually fit neatly into HDFS blocks. For example, a TextInputFormat’s logical records are lines, which will cross HDFS boundaries more often than not. This has no bearing on the functioning of your program—lines are not missed or broken, for example—but it’s worth knowing about, as it does mean that data-local maps (that is, maps that are running on the same host as their input data) will perform some remote reads. The slight overhead this causes is not normally significant.

如果文件在边界之间被 HDFS 分割,那么 Hadoop 框架会处理它。但如果您手动拆分文件,则必须考虑边界。

In reality we might have files to be split into chunks to be processed. But the keys may be spread across the file. It may not be a clear cut that one key followed by one value.

这是什么情况,我们可以看看这个问题的解决方法吗?

关于binary - Hadoop可以读取任意 key 二进制文件吗,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7577866/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com