gpt4 book ai didi

hadoop - TotalOrderPartitioner 给出错误的键类错误

转载 作者:可可西里 更新时间:2023-11-01 14:53:14 25 4
gpt4 key购买 nike

我正在尝试 TotalOrderPartitioner hadoop。这样做时出现以下错误。错误说明 - “错误的 key 类”

司机代码 -

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.InputSampler;
import org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner;


public class WordCountJobTotalSort {

public static void main (String args[]) throws Exception
{
if (args.length < 2 )
{
System.out.println("Plz provide I/p and O/p directory ");
System.exit(-1);
}

Job job = new Job();

job.setJarByClass(WordCountJobTotalSort.class);
job.setJobName("WordCountJobTotalSort");
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setMapperClass(WordMapper.class);
job.setPartitionerClass(TotalOrderPartitioner.class);
job.setReducerClass(WordReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(2);

TotalOrderPartitioner.setPartitionFile(job.getConfiguration(), new Path("/tmp/partition.lst"));

InputSampler.writePartitionFile(job, new InputSampler.RandomSampler<IntWritable, Text>(1,2,2));

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

映射器代码-

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;


public class WordMapper extends Mapper <LongWritable,Text,Text, IntWritable >
{

public void map(IntWritable mkey, Text value,Context context)
throws IOException, InterruptedException {

String s = value.toString();

for (String word : s.split(" "))
{
if (word.length() > 0 ){
context.write(new Text(word), new IntWritable(1));

}
}
}
}

reducer 代码 -

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;


public class WordReducer extends Reducer <Text, IntWritable, Text, IntWritable> {

public void reduce(Text rkey, Iterable<IntWritable> values ,Context context )
throws IOException, InterruptedException {

int count=0;

for (IntWritable value : values){

count = count + value.get();
}

context.write(rkey, new IntWritable(count));
}
}

错误-

[cloudera@localhost workspace]$ hadoop jar WordCountJobTotalSort.jar WordCountJobTotalSort file_seq/part-m-00000 file_out
15/05/18 00:45:13 INFO input.FileInputFormat: Total input paths to process : 1
15/05/18 00:45:13 INFO partition.InputSampler: Using 2 samples
15/05/18 00:45:13 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
15/05/18 00:45:13 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Exception in thread "main" java.io.IOException: wrong key class: org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.Text
at org.apache.hadoop.io.SequenceFile$RecordCompressWriter.append(SequenceFile.java:1340)
at org.apache.hadoop.mapreduce.lib.partition.InputSampler.writePartitionFile(InputSampler.java:336)
at WordCountJobTotalSort.main(WordCountJobTotalSort.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

输入文件-

[cloudera@localhost 工作区]$ hadoop fs -text file_seq/part-m-00000

0 你好你好

12 如何如何

20 是 是

26 你的

36个职位

最佳答案

InputSampler 在 Map 阶段(在 shuffle 和 reduce 之前)执行采样,采样是通过 Mapper 的输入 KEY 完成的。我们需要确保映射器的输入和输出 KEY 相同;否则 MR 框架将找不到合适的桶来将输出的键值对放入采样空间。

在这种情况下,输入 KEY 是 LongWritable,因此 InputSampler 将根据所有 LongWritable KEY 的子集创建一个分区。但是输出的KEY是Text,因此MR框架将无法从分区中找到合适的bucket。

我们可以通过引入准备阶段来解决这个问题。

关于hadoop - TotalOrderPartitioner 给出错误的键类错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30299122/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com