gpt4 book ai didi

java - Hadoop mapReduce 如何在 HDFS 中只存储值

转载 作者:可可西里 更新时间:2023-11-01 14:34:02 26 4
gpt4 key购买 nike

我正在使用它来删除重复行

public class DLines
{
public static class TokenCounterMapper extends Mapper<Object, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException
{
String line=value.toString();
//int hash_code=line.hashCode();
context.write(value, one);
}
}

public static class TokenCounterReducer extends Reducer<Text, IntWritable, Text, IntWritable>
{
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException
{
public void reduce(Text key, Iterable<IntWritable> values, Context context)throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable value : values)
{
sum += value.get();
}
if (sum<2)
{
context.write(key,new IntWritable(sum));
}
}
}

我只需要在 hdfs 中存储 key 。

最佳答案

如果你不需要reducer 的值,只需使用NullWritable .

你可以简单地说context.write(key,NullWritable.get());

在你的驱动中,你也可以设置

 job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

&

 job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);

关于java - Hadoop mapReduce 如何在 HDFS 中只存储值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23601380/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com