gpt4 book ai didi

java - Hadoop:二级排序不起作用

转载 作者:行者123 更新时间:2023-12-01 13:14:43 25 4
gpt4 key购买 nike

我在Hadoop 1.2.1中实现了一个算法,其中reducer代码依赖于二次排序。但是,当我运行该算法时,一个 reducer 会收到排序的元组,但另一个则不会。我花了很多时间试图找出原因,但没有成功。

有谁知道可能是什么问题吗?我认为这与辅助排序代码有关。

下面是实现二次排序的代码:

复合键

    public class CompositeKey implements WritableComparable<CompositeKey>{
public String key;
public Integer position;
@Override
public void readFields(DataInput arg0) throws IOException {
key = WritableUtils.readString(arg0);
position = arg0.readInt();
}
@Override
public void write(DataOutput arg0) throws IOException {
WritableUtils.writeString(arg0, key);
arg0.writeLong(position);
}
@Override
public int compareTo(CompositeKey o) {
int result = key.compareTo(o.key);
if(0 == result) {
result = position.compareTo(o.position);
}
return result;
}
}

键比较器

    public class CompositeKeyComparator extends WritableComparator {
protected CompositeKeyComparator() {
super(CompositeKey.class, true);
}
@SuppressWarnings("rawtypes")
@Override
public int compare(WritableComparable w1, WritableComparable w2) {
CompositeKey k1 = (CompositeKey)w1;
CompositeKey k2 = (CompositeKey)w2;

int result = k1.key.compareTo(k2.key);
if(0 == result) {
result = -1* k1.position.compareTo(k2.position);
}
return result;
}

}

分组比较器

    public class NaturalKeyGroupingComparator extends WritableComparator {
protected NaturalKeyGroupingComparator() {
super(CompositeKey.class, true);
}
@SuppressWarnings("rawtypes")
@Override
public int compare(WritableComparable w1, WritableComparable w2) {
CompositeKey k1 = (CompositeKey)w1;
CompositeKey k2 = (CompositeKey)w2;

return k1.key.compareTo(k2.key);
}

}

分区器

    public class NaturalKeyPartitioner extends Partitioner<CompositeKey, ReduceValue> {
@Override
public int getPartition(CompositeKey key, ReduceValue val, int numPartitions) {
int hash = key.key.hashCode();
int partition = hash & Integer.MAX_VALUE % numPartitions;
return partition;
}

作业配置

    //secondary sort
job.setPartitionerClass(NaturalKeyPartitioner.class);
job.setGroupingComparatorClass(NaturalKeyGroupingComparator.class);
job.setSortComparatorClass(CompositeKeyComparator.class);

如果我在伪分布式环境或集群上执行此操作,我会注意到一个 reducer 对元组进行了排序,而另一个则没有。例如,这里是显示两个 reducer 接收的元组的摘录(第一列是主 ket,第二列是辅助 ket):

    First reducer:
a1 0
a1 1
a1 11
a1 16
a1 27
a1 28
a1 34
a1 35
a1 37
a1 38
a1 43
a1 44
a1 46
a1 48
a1 50
a1 54
a1 55
a1 56
a1 57
a1 60
a1 61
a1 63
a1 64
a1 66
a1 69
a1 70
a1 72
a1 75
a1 76
a1 78
a1 79
a1 80
a1 84
a1 85
a1 86
a1 87
a1 88
a1 91
a1 92
a1 97
a1 102
a1 106
a1 108
a1 109
a1 110
a1 111
a1 116
a1 118
a1 119
a1 120

Second reducer:
a2 87
a2 115
a2 65
a2 90
a2 68
a2 119
a2 91
a2 0
a2 70
a2 3
a2 8
a2 9
a2 10
a2 71
a2 110
a2 16
a2 17
a2 20
a2 21
a2 23
a2 26
a2 72
a2 27
a2 94
a2 29
a2 30
a2 31
a2 75
a2 95
a2 36
a2 76
a2 117
a2 39
a2 40
a2 41
a2 42
a2 97
a2 79
a2 44
a2 45
a2 98
a2 46
a2 80
a2 49
a2 82
a2 50
a2 83
a2 100
a2 84
a2 112
a2 57
a2 59
a2 113
a2 60
a2 114
a2 61

最佳答案

我认为这是因为在 CompositeKey 的序列化/反序列化逻辑中,您将位置写为长整型,但将其读为整数。这会扰乱比较逻辑,因为您没有测试与写入上下文完全相同的内容。

关于java - Hadoop:二级排序不起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22563204/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com