gpt4 book ai didi

hadoop - JavaPairRDD Spark 方式实现 Hadoop Map

转载 作者:可可西里 更新时间:2023-11-01 15:13:17 26 4
gpt4 key购买 nike

我有一个 RDD:

JavaPairRDD<Long, ViewRecord> myRDD

这是通过 newAPIHadoopRDD 方法创建的。我有一个现有的 map 功能,我想以 Spark 方式实现它:

LongWritable one = new LongWritable(1L);

protected void map(Long key, ViewRecord viewRecord, Context context)
throws IOException ,InterruptedException {

String url = viewRecord.getUrl();
long day = viewRecord.getDay();

tuple.getKey().set(url);
tuple.getValue().set(day);

context.write(tuple, one);
};

PS:元组来源于:

KeyValueWritable<Text, LongWritable>

可以在这里找到:TextLong.java

最佳答案

我不知道什么是元组,但如果你只是想将记录映射到具有键 (url, day) 和值 1L 的元组,你可以这样做这个:

result = myRDD
.values()
.mapToPair(viewRecord -> {
String url = viewRecord.getUrl();
long day = viewRecord.getDay();
return new Tuple2<>(new Tuple2<>(url, day), 1L);
})


//java 7 style
JavaPairRDD<Pair, Long> result = myRDD
.values()
.mapToPair(new PairFunction<ViewRecord, Pair, Long>() {
@Override
public Tuple2<Pair, Long> call(ViewRecord record) throws Exception {
String url = record.getUrl();
Long day = record.getDay();

return new Tuple2<>(new Pair(url, day), 1L);
}
}
);

关于hadoop - JavaPairRDD Spark 方式实现 Hadoop Map,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31102579/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com