gpt4 book ai didi

java - Hadoop ArrayWritable 给我一个 ClassCastException

转载 作者:可可西里 更新时间:2023-11-01 16:18:56 24 4
gpt4 key购买 nike

编辑:问题已解决 - 我犯了一个非常愚蠢的错误。

我有一个由 map、reduce、map 和 reduce 组成的 MapReduce 管道。我对第一个 reduce 使用 SequenceFileOutputFormat,对第二个映射使用 SequenceFileInputFormat。我查看了它的用法,似乎我使用它是正确的。我要放入其中的类型是 IntWritable 和 IntPairArrayWritable(使用来自 mahout 的 IntPairWritable 的自定义 ArrayWritable 子类)。问题是,在第二张 map 中读取 IntPairArrayWritable 时,当我尝试取出单个 IntPairWritables 时,我得到了 ClassCastException。我不确定这是由于我使用 ArrayWritable 类的方式出错还是我使用 SequenceFile{Input,Output}Format 有问题。我在这里和其他地方看过很多例子,在我看来我都做对了,但我仍然遇到错误。有帮助吗?

具体情况:

这是我的第一个 reducer 类:

public static class WalkIdReducer extends MapReduceBase implements
Reducer<IntWritable, IntPairWritable, IntWritable, IntPairArrayWritable> {

@Override
public void reduce(IntWritable walk_id, Iterator<IntPairWritable> values,
OutputCollector<IntWritable, IntPairArrayWritable> output,
Reporter reporter) throws IOException {
ArrayList<IntPairWritable> value_array = new ArrayList<IntPairWritable>();
while (values.hasNext()) {
value_array.add(values.next());
}
output.collect(walk_id, IntPairArrayWritable.fromArrayList(value_array));
}
}

第二个映射器类:

public static class NodePairMapper extends MapReduceBase implements
Mapper<IntWritable, IntPairArrayWritable, IntPairWritable, Text> {

@Override
public void map(IntWritable key, IntPairArrayWritable value,
OutputCollector<IntPairWritable, Text> output,
Reporter reporter) throws IOException {
// The following line gives a ClassCastException;
// See IntPairArrayWritable.toArrayList(), below
ArrayList<IntPairWritable> values = value.toArrayList();
// other unimportant stuff
}
}

第一个MapReduce作业配置的相关部分:

    conf.setReducerClass(WalkIdReducer.class);
conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(IntPairArrayWritable.class);
conf.setOutputFormat(SequenceFileOutputFormat.class);

对于第二个 MapReduce:

    conf.setInputFormat(SequenceFileInputFormat.class);
conf.setMapperClass(NodePairMapper.class);

最后,我的 ArrayWritable 子类:

public static class IntPairArrayWritable extends ArrayWritable
{
// These two methods are what people say is all you need for
// creating an ArrayWritable subclass
public IntPairArrayWritable() {
super(IntPairArrayWritable.class);
}

public IntPairArrayWritable(IntPairWritable[] values) {
super(IntPairArrayWritable.class, values);
}

// Some convenience methods, so I can use ArrayLists in
// other parts of the code
public static IntPairArrayWritable fromArrayList(
ArrayList<IntPairWritable> array) {
IntPairArrayWritable writable = new IntPairArrayWritable();
IntPairWritable[] values = new IntPairWritable[array.size()];
for (int i=0; i<array.size(); i++) {
values[i] = array.get(i);
}
writable.set(values);
return writable;
}

public ArrayList<IntPairWritable> toArrayList() {
ArrayList<IntPairWritable> array = new ArrayList<IntPairWritable>();
for (Writable pair : this.get()) {
// This line is what kills it. I get a ClassCastException here.
IntPairWritable int_pair = (IntPairWritable) pair;
array.add(int_pair);
}
return array;
}
}

我得到的具体错误如下:

java.lang.ClassCastException: WalkAnalyzer$IntPairArrayWritable cannot be cast to org.apache.mahout.common.IntPairWritable
at WalkAnalyzer$IntPairArrayWritable.toArrayList(WalkAnalyzer.java:231)
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:84)
at WalkAnalyzer$NodePairMapper.map(WalkAnalyzer.java:77)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

我很困惑为什么 ArrayWritable 的 get() 方法输出的是 WalkAnalyzer$IntPairArrayWritable 的一个实例 - 我期望 get() 返回一个数组IntPairArrayWritable 中包含的元素,如 API 中所述。

编辑

我发现了问题。这就是我为 IntPairArrayWritable 编写构造函数的方式。当我应该调用 super(IntPairWritable.class); 时,我调用了 super(IntPairArrayWritable.class);。代码实际上应该是这样的:

public static class IntPairArrayWritable extends ArrayWritable
{
// These two methods are what people say is all you need for
// creating an ArrayWritable subclass
public IntPairArrayWritable() {
super(IntPairWritable.class);
}

public IntPairArrayWritable(IntPairWritable[] values) {
super(IntPairWritable.class, values);
}
}

我想为 ArrayWritable 子类使用一个不太容易混淆的名称是个好主意,这样更容易发现错误。

最佳答案

检查 IntPairWritable 的导入语句。看起来您在 Mapper 中选择了错误的包名称,因此正在转换为不同的类,即使它的名称也是 IntPairWritable。

关于java - Hadoop ArrayWritable 给我一个 ClassCastException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12979387/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com