gpt4 book ai didi

java - 为什么我无法在 Hadoop 中获取 FileName 并以格式(Word 文件名计数)显示它?

转载 作者:行者123 更新时间:2023-12-01 18:04:52 25 4
gpt4 key购买 nike

输入是一个名为 Wiki-micro.txt 的文本文件...字数统计程序运行良好..我需要的是修改它并将其输出格式从(字数统计)更改为(Word### #文件名计数)我想要我的输出格式(Word#####文件名计数),您能让我知道我哪里出错了吗?我使用了输入分割,但它不起作用..请帮助我。

  public static class Map extends Mapper<LongWritable ,  Text ,  Text ,  IntWritable > {
private final static IntWritable one = new IntWritable( 1);
private Text word = new Text();

private static final Pattern WORD_BOUNDARY = Pattern .compile("\\s*\\b\\s*");

public void map( LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {

String line = lineText.toString();
Text currentWord = new Text();
InputSplit input_split = context.getInputSplit();
String FName = ((FileSplit) input_split).getPath().getName();

for ( String word : WORD_BOUNDARY .split(line)) {
if (word.isEmpty()) {
continue;
}
currentWord = new Text(word);
context.write(currentWord, one);
context.write(new Text(FName), one);
}
}

}

最佳答案

不确定,但是如果替换最后 3 行会发生什么:

        currentWord  = new Text(word);
context.write(currentWord, one);
context.write(new Text(FName), one);

        currentWord  = new Text(word + "####" + FName);
context.write(currentWord, one);
context.write(new Text(FName), one);

关于java - 为什么我无法在 Hadoop 中获取 FileName 并以格式(Word 文件名计数)显示它?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60575497/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com