gpt4 book ai didi

hadoop - FSDataOutputStream.writeUTF() 在 hdfs 数据的开头添加额外的字符。如何避免这些额外的数据?

转载 作者:可可西里 更新时间:2023-11-01 15:26:24 26 4
gpt4 key购买 nike

我正在尝试的是将 hdfs 上具有 xml 数据的序列文件转换为 hdfs 上的 .xml 文件。

在 Google 上搜索并找到以下代码。我根据自己的需要做了修改,下面是代码..

public class SeqFileWriterCls {
public static void main(String args[]) throws Exception {
System.out.println("Reading Sequence File");
Path path = new Path("seq_file_path/seq_file.seq");
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
SequenceFile.Writer writer = null;
SequenceFile.Reader reader = null;
FSDataOutputStream fwriter = null;
OutputStream fowriter = null;
try {
reader = new SequenceFile.Reader(fs, path, conf);
//writer = new SequenceFile.Writer(fs, conf,out_path,Text.class,Text.class);
Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf);

Writable value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf);

while (reader.next(key, value)) {
//i am just editing the path in such a way that key will be my filename and data in it will be the value
Path out_path = new Path(""+key);
String string_path = out_path.toString();
String clear_path=string_path.substring(string_path.lastIndexOf("/")+1);

Path finalout_path = new Path("path"+clear_path);
System.out.println("the final path is "+finalout_path);
fwriter = fs.create(finalout_path);
fwriter.writeUTF(value.toString());
fwriter.close();
FSDataInputStream in = fs.open(finalout_path);
String s = in.readUTF();
System.out.println("file has: -" + s);
//fowriter = fs.create(finalout_path);
//fowriter.write(value.toString());
System.out.println(key + " <===> :" + value.toString());
System.exit(0);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
IOUtils.closeStream(reader);
fs.close();
}

}

我正在使用“FSDataOutputStream”将数据写入 HDFS,使用的方法是“writeUTF”问题是,当我写入 hdfs 文件时,一些额外的字符会进入数据的开头。但是当我打印数据时,我看不到多余的字符。

我试过使用 writeChars() 但即使 taht 也不起作用。

有什么办法可以避免这种情况吗?或者有没有其他方法可以将数据写入 HDFS???

请帮忙...

最佳答案

writeUTF(String str) 方法的 JavaDoc 说明如下:

Writes a string to the underlying output stream using modified UTF-8 encoding in a machine-independent manner. First, two bytes are written to the output stream as if by the writeShort method giving the number of bytes to follow. This value is the number of bytes actually written out, not the length of the string. Following the length, each character of the string is output, in sequence, using the modified UTF-8 encoding for the character. (...)

writeBytes(String str)writeChars(String str) 方法都应该可以正常工作。

关于hadoop - FSDataOutputStream.writeUTF() 在 hdfs 数据的开头添加额外的字符。如何避免这些额外的数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46197855/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com