gpt4 book ai didi

hadoop - Hadoop作业输出中不需要的字符

转载 作者:行者123 更新时间:2023-12-02 20:12:47 25 4
gpt4 key购买 nike

我写了一个简单的程序来收集一些数据中有关二元组的统计信息。
我将统计信息打印到自定义文件。

Path file = new Path(context.getConfiguration().get("mapred.output.dir") + "/bigram.txt");
FSDataOutputStream out = file.getFileSystem(context.getConfiguration()).create(file);

我的代码包含以下几行:
Text.writeString(out, "total number of unique bigrams: " + uniqBigramCount + "\n");
Text.writeString(out, "total number of bigrams: " + totalBigramCount + "\n");
Text.writeString(out, "number of bigrams that appear only once: " + onceBigramCount + "\n");

我在vim / gedit中得到以下输出:
'total number of unique bigrams: 424462
!total number of bigrams: 1578220
0number of bigrams that appear only once: 296139

除了行首的多余字符外,还有一些非打印字符。这可能是什么原因?

最佳答案

正如@ThomasJungblut所说,writeString方法为对writeString的每次调用写出两个值-字符串的长度(作为vint)和String字节:

/** Write a UTF8 encoded string to out
*/
public static int writeString(DataOutput out, String s) throws IOException {
ByteBuffer bytes = encode(s);
int length = bytes.limit();
WritableUtils.writeVInt(out, length);
out.write(bytes.array(), 0, length);
return length;
}

如果您只是想将文本输出打印到此文件(即所有人类可读的),那么我建议您将 out变量包装为 PrintStream,然后使用println或printf方法:
PrintStream ps = new PrintStream(out);
ps.printf("total number of unique bigrams: %d\n", uniqBigramCount);
ps.printf("total number of bigrams: %d\n", totalBigramCount);
ps.printf("number of bigrams that appear only once: %d\n", onceBigramCount);
ps.close();

关于hadoop - Hadoop作业输出中不需要的字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11643082/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com