gpt4 book ai didi

hadoop - 正确使用 SequenceFileInputFormat,映射中的键类型不匹配

转载 作者:可可西里 更新时间:2023-11-01 14:50:37 25 4
gpt4 key购买 nike

我正在尝试运行电子书 Mahout in Action 中第 6 章( list 6.1 ~ 6.4)中的推荐系统示例。有两个映射器/缩减器对。这是代码:

映射器 - 1

public class WikipediaToItemPrefsMapper extends 
Mapper<LongWritable,Text,VarLongWritable,VarLongWritable> {

private static final Pattern NUMBERS = Pattern.compile("(\d+)");

@Override
public void map(LongWritable key,
Text value,
Context context)
throws IOException, InterruptedException {

String line = value.toString();
Matcher m = NUMBERS.matcher(line);
m.find();
VarLongWritable userID = new VarLongWritable(Long.parseLong(m.group()));
VarLongWritable itemID = new VarLongWritable();
while (m.find()) {
itemID.set(Long.parseLong(m.group()));
context.write(userID, itemID);
}
}

reducer - 1

public class WikipediaToUserVectorReducer extends 
Reducer<VarLongWritable,VarLongWritable,VarLongWritable,VectorWritable> {
@Override
public void reduce(VarLongWritable userID,
Iterable<VarLongWritable> itemPrefs,
Context context)
throws IOException, InterruptedException {

Vector userVector = new RandomAccessSparseVector(
Integer.MAX_VALUE, 100);
for (VarLongWritable itemPref : itemPrefs) {
userVector.set((int)itemPref.get(), 1.0f);
}

//LongWritable userID_lw = new LongWritable(userID.get());
context.write(userID, new VectorWritable(userVector));
//context.write(userID_lw, new VectorWritable(userVector));
}

reducer 输出一个 userID 和一个 userVector,它看起来像这样:98955 {590:1.0 22:1.0 9059:1.0 3:1.0 2:1.0 1:1.0} 前提是在驱动程序中使用了 FileInputformat 和 TextInputFormat。

我想使用另一对 mapper-reducer 来进一步处理这些数据:

映射器 - 2

public class UserVectorToCooccurenceMapper extends
Mapper<VarLongWritable,VectorWritable,IntWritable,IntWritable> {

@Override
public void map(VarLongWritable userID,
VectorWritable userVector,
Context context)
throws IOException, InterruptedException {

Iterator<Vector.Element> it = userVector.get().iterateNonZero();
while (it.hasNext()) {
int index1 = it.next().index();
Iterator<Vector.Element> it2 = userVector.get().iterateNonZero();
while (it2.hasNext()) {
int index2 = it2.next().index();
context.write(new IntWritable(index1),
new IntWritable(index2));
}
}
}

reducer - 2

公共(public)类 UserVectorToCooccurenceReducer 扩展 reducer {

@Override
public void reduce(IntWritable itemIndex1,
Iterable<IntWritable> itemIndex2s,
Context context)
throws IOException, InterruptedException {

Vector cooccurrenceRow = new RandomAccessSparseVector(Integer.MAX_VALUE, 100);
for (IntWritable intWritable : itemIndex2s) {
int itemIndex2 = intWritable.get();
cooccurrenceRow.set(itemIndex2, cooccurrenceRow.get(itemIndex2) + 1.0);
}
context.write(itemIndex1, new VectorWritable(cooccurrenceRow));
}

这是我正在使用的驱动程序:

public final class RecommenderJob extends Configured implements Tool {

@覆盖 public int run(String[] args) 抛出异常 {

  Job job_preferenceValues = new Job (getConf());
job_preferenceValues.setJarByClass(RecommenderJob.class);
job_preferenceValues.setJobName("job_preferenceValues");

job_preferenceValues.setInputFormatClass(TextInputFormat.class);
job_preferenceValues.setOutputFormatClass(SequenceFileOutputFormat.class);

FileInputFormat.setInputPaths(job_preferenceValues, new Path(args[0]));
SequenceFileOutputFormat.setOutputPath(job_preferenceValues, new Path(args[1]));

job_preferenceValues.setMapOutputKeyClass(VarLongWritable.class);
job_preferenceValues.setMapOutputValueClass(VarLongWritable.class);

job_preferenceValues.setOutputKeyClass(VarLongWritable.class);
job_preferenceValues.setOutputValueClass(VectorWritable.class);

job_preferenceValues.setMapperClass(WikipediaToItemPrefsMapper.class);
job_preferenceValues.setReducerClass(WikipediaToUserVectorReducer.class);

job_preferenceValues.waitForCompletion(true);

Job job_cooccurence = new Job (getConf());
job_cooccurence.setJarByClass(RecommenderJob.class);
job_cooccurence.setJobName("job_cooccurence");

job_cooccurence.setInputFormatClass(SequenceFileInputFormat.class);
job_cooccurence.setOutputFormatClass(TextOutputFormat.class);

SequenceFileInputFormat.setInputPaths(job_cooccurence, new Path(args[1]));
FileOutputFormat.setOutputPath(job_cooccurence, new Path(args[2]));

job_cooccurence.setMapOutputKeyClass(VarLongWritable.class);
job_cooccurence.setMapOutputValueClass(VectorWritable.class);

job_cooccurence.setOutputKeyClass(IntWritable.class);
job_cooccurence.setOutputValueClass(VectorWritable.class);

job_cooccurence.setMapperClass(UserVectorToCooccurenceMapper.class);
job_cooccurence.setReducerClass(UserVectorToCooccurenceReducer.class);

job_cooccurence.waitForCompletion(true);

return 0;

public static void main(String[] args) throws Exception {
ToolRunner.run(new Configuration(), new RecommenderJob(), args);

}

我得到的错误是:

java.io.IOException: Type mismatch in key from map: expected org.apache.mahout.math.VarLongWritable, received org.apache.hadoop.io.IntWritable

在谷歌搜索修复过程中,我发现我的问题类似于 this question .但不同的是,我已经在使用 SequenceFileInputFormat 和 SequenceFileOutputFormat,我相信是正确的。我还看到 org.apache.mahout.cf.taste.hadoop.item.RecommenderJob 或多或少做了类似的事情。据我了解&Yahoo Tutorial

SequenceFileOutputFormat rapidly serializes arbitrary data types to the file; the corresponding SequenceFileInputFormat will deserialize the file into the same types and presents the data to the next Mapper in the same manner as it was emitted by the previous Reducer.

我做错了什么?真的很感谢某人的一些指点..我花了一天时间试图解决这个问题但一无所获:(

最佳答案

您的第二个映射器具有以下签名:

public class UserVectorToCooccurenceMapper extends 
Mapper<VarLongWritable,VectorWritable,IntWritable,IntWritable>

但是您在驱动程序代码中定义了以下内容:

job_cooccurence.setMapOutputKeyClass(VarLongWritable.class);
job_cooccurence.setMapOutputValueClass(VectorWritable.class);

reducer 期待 <IntWritable, IntWritable>作为输入,因此您只需将驱动程序代码修改为:

job_cooccurence.setMapOutputKeyClass(IntWritable.class);
job_cooccurence.setMapOutputValueClass(IntWritable.class);

关于hadoop - 正确使用 SequenceFileInputFormat,映射中的键类型不匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11659470/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com