gpt4 book ai didi

java - 如何在mapReduce Hadoop框架中对值(及其相应的键)进行排序?

转载 作者:行者123 更新时间:2023-12-02 10:03:02 25 4
gpt4 key购买 nike

我正在尝试使用 Hadoop mapReduce 对输入数据进行排序。问题是我只能按键对键值对进行排序,而我试图按值对它们进行排序。每个值的键都是用计数器创建的,因此第一个值 (234) 具有键 1,第二个值 (944) 具有键 2,等等。知道如何执行此操作并按值对输入进行排序吗?


import java.io.IOException;
import java.util.StringTokenizer;
import java.util.ArrayList;
import java.util.List;
import java.util.Collections;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class Sortt {

public static class TokenizerMapper
extends Mapper<Object, Text, Text ,IntWritable >{
int k=0;
int v=0;
int va=0;
public Text ke = new Text();
private final static IntWritable val = new IntWritable();

public void map(Object key, Text value, Context context) throws
IOException, InterruptedException
{
StringTokenizer itr = new StringTokenizer(value.toString());


while (itr.hasMoreTokens())
{
val.set(Integer.parseInt(itr.nextToken()));
v=val.get();
k=k+1;
ke.set(Integer.toString(k));

context.write(ke, new IntWritable(v));}
}


}


public static class SortReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {
int a=0;
int v=0;
private IntWritable va = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
List<Integer> sorted = new ArrayList<Integer>();

for (IntWritable val : values) {
a= val.get();
sorted.add(a);

}
Collections.sort(sorted);
for(int i=0;i<sorted.size();i++) {
v=sorted.get(i);
va.set(v);

context.write(key, va);
}
}
}

public static void main(String[] args) throws Exception {
long startTime=0;
long Time=0;
long duration=0;
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "sort");
job.setJarByClass(Sortt.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(SortReducer.class);
job.setReducerClass(SortReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
Time = System.currentTimeMillis();
//duration = (endTime-startTime)/1000000;
System.out.println("time="+Time+"MS");
}
}

输入:

234

944

241

130

369

470

250

100

250

735

856

659

425

756

123

756

459

754

654

951

753

254

698

741

预期输出:

8100

15123

4130

1234

3241

24241

7250

9250

22254

5369

13425

17459

6470

19654

12659

23698

10735

21753

18754

14756

16756

11856

2944

20951

当前输出:

1234

10735

11856

12659

13425

14757

15123

16756

17459

18754

19654

2944

20951

21753

22254

23698

24741

3241

4130

5369

6470

7250

8100

9250

最佳答案

MapReduce 输出默认按键排序,要按值排序,您可以使用辅助排序。二次排序是根据值对 reducer 输出进行排序的最佳技术之一,here是一个完整的示例。

关于java - 如何在mapReduce Hadoop框架中对值(及其相应的键)进行排序?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55494120/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com