gpt4 book ai didi

java - 首先按值然后按键对 JavaPairRDD 进行排序

转载 作者:搜寻专家 更新时间:2023-10-30 21:11:59 24 4
gpt4 key购买 nike

我正在尝试按值对 RDD 进行排序,如果多个值相等,那么我需要按字典顺序对这些值进行排序。

代码:

JavaPairRDD <String,Long> rddToSort = rddMovieReviewReducedByKey.mapToPair(new PairFunction < Tuple2 < String, MovieReview > , String, Long > () {

@Override
public Tuple2 < String, Long > call(Tuple2 < String, MovieReview > t) throws Exception {
return new Tuple2 < String, Long > (t._1, t._2.count);
}
});

到目前为止我所做的是,使用 takeOrdered 并提供一个 CustomComperator,但是由于 takeOrdered 无法处理大量数据,当运行代码时它不断退出(它占用了大量操作系统无法处理的内存):

List < Tuple2 < String, Long >> rddSorted = rddMovieReviewReducedByKey.mapToPair(new PairFunction < Tuple2 < String, MovieReview > , String, Long > () {

@Override
public Tuple2 < String, Long > call(Tuple2 < String, MovieReview > t) throws Exception {
return new Tuple2 < String, Long > (t._1, t._2.count);
}
}).takeOrdered(newTopMovies, MapLongValueComparator.VALUE_COMP);

计算器:

    static class MapLongValueComparator implements Comparator < Tuple2 < String, Long >> , Serializable {
private static final long serialVersionUID = 1L;

private static final MapLongValueComparator VALUE_COMP = new MapLongValueComparator();

@Override
public int compare(Tuple2 < String, Long > o1, Tuple2 < String, Long > o2) {
if (o1._2.compareTo(o2._2) == 0) {
return o1._1.compareTo(o2._1);
}
return -o1._2.compareTo(o2._2);
}
}

错误:

16/06/30 21:09:23 INFO scheduler.DAGScheduler: Job 18 failed: takeOrdered at MovieAnalyzer.java:708, took 418.149182 s

你会如何对这个 RDD 进行排序?您将如何考虑 TopKMovies 的值(value),以及在按字典顺序排列相等键的情况下。

谢谢。

最佳答案

在映射 <String, Long> 之后,使用带有比较器和分区的 sortByKey 解决了问题PairRDD到< Tuple2<String,Long> , Long> PairRDD

JavaPairRDD <Tuple2<String,Long>, Long> sortedRdd = rddMovieReviewReducedByKey.mapToPair(new PairFunction < Tuple2 < String, MovieReview > , Tuple2<String,Long>, Long > () {

@Override
public Tuple2 < Tuple2<String,Long>, Long > call(Tuple2 < String, MovieReview > t) throws Exception {
return new Tuple2 < Tuple2<String,Long>, Long > (new Tuple2<String,Long>(t._1,t._2.count), t._2.count);
}
}).sortByKey(new TupleMapLongComparator(), true, 100);


JavaPairRDD <String,Long> sortedRddToPairs = sortedRdd.mapToPair(new PairFunction<Tuple2<Tuple2<String,Long>,Long>, String, Long>() {

@Override
public Tuple2<String, Long> call(
Tuple2<Tuple2<String, Long>, Long> t) throws Exception {
return new Tuple2 < String, Long > (t._1._1, t._1._2);
}

});

比较器:

private class TupleMapLongComparator implements Comparator<Tuple2<String,Long>>, Serializable {
@Override
public int compare(Tuple2<String,Long> tuple1, Tuple2<String,Long> tuple2) {

if (tuple1._2.compareTo(tuple2._2) == 0) {
return tuple1._1.compareTo(tuple2._1);
}
return -tuple1._2.compareTo(tuple2._2);
}
}

关于java - 首先按值然后按键对 JavaPairRDD 进行排序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38131604/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com