gpt4 book ai didi

hadoop - Hadoop> Mapper类输入错误

转载 作者:行者123 更新时间:2023-12-02 22:06:21 25 4
gpt4 key购买 nike

我使用的输入文本文件的内容是

1   "Come 
1 "Defects,"
1 "I
1 "Information
1 "J"
2 "Plain
5 "Project
1 "Right
1 "Viator"
左侧的数字和右侧的单词由制表符分隔
但是当我执行下面的mapper函数时
public static class SortingMapper extends Mapper<Text, Text, Pair, NullWritable> 
{
private Text word = new Text();
private IntWritable freq = new IntWritable();

@Override
public void map(Text key, Text value, Context context) throws IOException, InterruptedException
{
String line = value.toString();
String[] words = line.split("\t");

freq = new IntWritable(Integer.parseInt(words[0]));
word.set(words[1]);
context.write(new Pair(word, freq), NullWritable.get());}}
public static class FirstPartitioner extends Partitioner<Pair, NullWritable>
{
@Override
public int getPartition(Pair p, NullWritable n, int numPartitions)
{
String word = p.getFirst().toString();

char first = word.charAt(0);
char middle = 'n';

if(middle < first)
{
return 0;
}
else
return 1 % numPartitions; //why does % need???
}
}

public static class KeyComparator extends WritableComparator
{

protected KeyComparator()
{
super(Pair.class, true);
}

@Override
public int compare(WritableComparable w1, WritableComparable w2)
{
Pair v1 = (Pair) w1;
Pair v2 = (Pair) w2;

/*
* since we already count word in the first MR we only need to sort the list by frequency
* so no need to compare Text again
int cmp = Pair.compare(v1.getFirst(), v2.getFirst());
if(cmp != 0) { return cmp; }
*/

return -1 * v1.compareTo(v2);
//possible error: it compares Text first and then compare IntWritable
}
}

public static class GroupComparator extends WritableComparator
{
protected GroupComparator()
{
super(Pair.class, true);
}

@Override
public int compare(WritableComparable w1, WritableComparable w2)
{
Pair v1 = (Pair) w1;
Pair v2 = (Pair) w2;
return v1.getFirst().compareTo(v2.getFirst());
//this compareTo is under binarycomparable
}
}

public static class SortingReducer extends Reducer<Pair, NullWritable, Pair, NullWritable>
{
@Override
public void reduce(Pair p, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException
{
System.out.println("sortingReducer");
context.write(p, NullWritable.get());
}
}

public static void main(String[] args) throws Exception
{

Configuration conf2 = new Configuration();
//String[] otherArgs2 = new GenericOptionsParser(conf1, args).getRemainingArgs();

ControlledJob cJob2 = new ControlledJob(conf2);
//conf2.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", " ");
cJob2.setJobName("Sorting");

Job job2 = cJob2.getJob();

job2.setJarByClass(Sorting.class);

job2.setInputFormatClass(KeyValueTextInputFormat.class);

job2.setMapperClass(SortingMapper.class);
job2.setPartitionerClass(FirstPartitioner.class);
job2.setSortComparatorClass(KeyComparator.class);
job2.setGroupingComparatorClass(GroupComparator.class);
job2.setReducerClass(SortingReducer.class);

job2.setOutputKeyClass(Pair.class);
job2.setOutputValueClass(NullWritable.class);

job2.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job2, new Path("hdfs:///tmp/inter/part-r-
00000.txt"));
FileOutputFormat.setOutputPath(job2, new Path(args[0]));

job2.waitForCompletion(true);

}
然后我下面有一些错误
Error: java.lang.NumberFormatException: For input string: ""Come"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:481)
at java.lang.Integer.parseInt(Integer.java:527)
at Sorting$SortingMapper.map(Sorting.java:98)
at Sorting$SortingMapper.map(Sorting.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
我猜在String []单词中有问题,但是我不知道该怎么解决。如果您能帮助我解决错误,将不胜感激。
另外
我发现我曾经
 job2.setInputFormatClass(KeyValueTextInputFormat.class); 


在主要功能中,通过制表符分隔符将键和值分开,所以我只是更改了
String line = value.toString();

String[] words = line.split("\t");

freq = new IntWritable(Integer.parseInt(words[0]));
word.set(words[1]);
进入
String num = key.toString();
freq = new IntWritable(Integer.parseInt(num));
word = value;
context.write(new Pair(word, freq), NullWritable.get());

它运行成功,但是输出很奇怪。
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
Sorting$Pair@5b5b072f
....
我的预期输出是
5   "Project 
2 "Plain
1 "Come
1 "Defects,"
1 "I
1 "Information
1 "J"
1 "Right
1 "Viator"
变化使情况变得更糟吗?

最佳答案

您只需要在toString对象上覆盖Pair并返回您想要成为每个记录的最终输出的任何内容即可。

像这样

class Pair {

...

@Override
public String toString() {
return freq + " " + word;
}
}

关于hadoop - Hadoop> Mapper类输入错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24373447/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com