gpt4 book ai didi

java - 在 Hadoop 中使用 NullWritable 的优势

转载 作者:IT老高 更新时间:2023-10-28 21:16:01 24 4
gpt4 key购买 nike

null 键/值使用 NullWritable 比使用 null 文本(即 new Text(null) )。我从《Hadoop:权威指南》一书中看到以下内容。

NullWritable is a special type of Writable, as it has a zero-length serialization. No bytes are written to, or read from, the stream. It is used as a placeholder; for example, in MapReduce, a key or a value can be declared as a NullWritable when you don’t need to use that position—it effectively stores a constant empty value. NullWritable can also be useful as a key in SequenceFile when you want to store a list of values, as opposed to key-value pairs. It is an immutable singleton: the instance can be retrieved by calling NullWritable.get()

我不清楚如何使用 NullWritable 写出输出?会不会在开始的输出文件中有一个常量值表示这个文件的key或者value是null,这样MapReduce框架就可以忽略读取nullkeys/值(以 null 为准)?另外,null 文本实际上是如何序列化的?

谢谢,

文卡特

最佳答案

键/值类型必须在运行时给出,所以任何写入或读取 NullWritables 的东西都会提前知道它将处理该类型;文件中没有标记或任何内容。从技术上讲,NullWritables 是“读取”的,只是“读取”一个 NullWritable 实际上是无操作的。你可以亲眼看到根本没有写或读:

NullWritable nw = NullWritable.get();
ByteArrayOutputStream out = new ByteArrayOutputStream();
nw.write(new DataOutputStream(out));
System.out.println(Arrays.toString(out.toByteArray())); // prints "[]"

ByteArrayInputStream in = new ByteArrayInputStream(new byte[0]);
nw.readFields(new DataInputStream(in)); // works just fine

关于new Text(null)的问题,你可以再试一试:

Text text = new Text((String)null);
ByteArrayOutputStream out = new ByteArrayOutputStream();
text.write(new DataOutputStream(out)); // throws NullPointerException
System.out.println(Arrays.toString(out.toByteArray()));

Text 根本无法使用 null String

关于java - 在 Hadoop 中使用 NullWritable 的优势,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16198752/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com