gpt4 book ai didi

Hadoop 自己的数据类型

转载 作者:可可西里 更新时间:2023-11-01 14:45:35 27 4
gpt4 key购买 nike

我使用 hadoop 已经有一段时间了,但我不确定为什么 Hadoop 使用它自己的数据类型而不是 Java 数据类型?我一直在互联网上寻找同样的东西,但没有任何帮助。请帮忙。

最佳答案

简短的回答是因为它们提供的序列化和反序列化性能。

长版:

使用 Writables(Hadoop 的数据类型)的主要好处在于它们的效率。与显然是替代选择的 Java 序列化相比,它们具有更紧凑的表示形式。可写对象不会将它们的类型存储在序列化表示中,因为在反序列化时它是已知的类型。

以下是 Hadoop 权威指南中更详细的摘录:

Java serialization is not compact, classes that implement java.io.Serializable or java.io.Externalizable write their classname and the object representation to the stream. Subsequent instances of the same class write a reference handle to the first occurrence, which occupies only 5 bytes. However, reference handles don't work well with random access, because the referent class may occur at any point in the preceding stream - that is, there is state stored in the stream. Even worse, reference handles play havoc with sorting records in a serialized stream, since the first record of a particular class is distinguished and must be treated as a special case. All these problems can be avoided by not writing the classname to the stream at all, which is the approach Writable takes. The result is that the format is considerably more compact than Java serialization, and random access and sorting work as expected because each record is independent of the others (so there is no stream state).

关于Hadoop 自己的数据类型,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29877787/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com