gpt4 book ai didi

hadoop - 追加到现有序列文件

转载 作者:可可西里 更新时间:2023-11-01 16:40:41 24 4
gpt4 key购买 nike

有人可以提供示例代码片段以说明如何将文件附加到现有序列文件中吗?

下面是我用来附加到现有序列文件输出文件的代码,但是在附加后读取序列文件时它抛出校验和错误:

打开校验和文件时出现问题:/Users/{homedirectory}/Desktop/Sample/SequenceFile/outputfile。忽略异常:java.io.EOFException

public class AppendSequenceFile {

/**
* @param args
* @throws IOException
* @throws IllegalAccessException
* @throws InstantiationException
*/
public static void main(String[] args) throws IOException,
InstantiationException, IllegalAccessException {

Configuration conf = new Configuration();

FileSystem fs = FileSystem.get(conf);
Path inputFile = new Path("/Users/{homedirectory}/Desktop/Sample/SequenceFile/sampleAppendTextFiles");
Path sequenceFile = new Path("/Users/{homedirectory}/Desktop/Sample/SequenceFile/outputfile");
FSDataInputStream inputStream;
Text key = new Text();
Text value = new Text();
SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf,
sequenceFile, key.getClass(), value.getClass());
FileStatus[] fStatus = fs.listStatus(inputFile);

for (FileStatus fst : fStatus) {
String str = "";
System.out.println("Processing file : " + fst.getPath().getName() + " and the size is : " + fst.getPath().getName().length());
inputStream = fs.open(fst.getPath());
key.set(fst.getPath().getName());
while(inputStream.available()>0) {
str = str+inputStream.readLine();
}
value.set(str);
writer.append(key, value);

}
}
}

序列文件阅读器:

public class SequenceFileReader{
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path path = new Path("/Users/{homedirectory}/Desktop/Sample/SequenceFile/outputfile");
SequenceFile.Reader reader = null;
try {
reader = new SequenceFile.Reader(fs, path, conf);
Text key = new Text();
Text value = new Text();
while (reader.next(key, value)) { System.out.println(key);
System.out.println(value);
}
} finally {
IOUtils.closeStream(reader);
}
}
}

提前致谢。

最佳答案

我自己没有这样做,但是浏览 Hadoop API 文档时我发现了这一点。

您可以使用此 API 来创建编写器。请引用SequenceFile

public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileContext fc,Configuration conf,Path name,Class keyClass,Class valClass,org.apache.hadoop.io.SequenceFile.CompressionType compressionType,CompressionCodec codec,org.apache.hadoop.io.SequenceFile.Metadata metadata,EnumSet<CreateFlag> createFlag,org.apache.hadoop.fs.Options.CreateOpts... opts) throws IOException

在此 API 中,CreateFlag可以帮助您指定“APPEND”选项。

关于hadoop - 追加到现有序列文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41344290/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com