gpt4 book ai didi

scala - Spark : Sort records in groups?

转载 作者:行者123 更新时间:2023-12-04 03:07:14 28 4
gpt4 key购买 nike

我有一组记录,需要:

1)按“日期”,“城市”和“种类”分组

2)按“奖”对每个组进行排序

在我的代码中:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Sort {

case class Record(name:String, day: String, kind: String, city: String, prize:Int)

val recs = Array (
Record("n1", "d1", "k1", "c1", 10),
Record("n1", "d1", "k1", "c1", 9),
Record("n1", "d1", "k1", "c1", 8),
Record("n2", "d2", "k2", "c2", 1),
Record("n2", "d2", "k2", "c2", 2),
Record("n2", "d2", "k2", "c2", 3)
)

def main(args: Array[String]): Unit = {
val conf = new SparkConf()
.setAppName("Test")
.set("spark.executor.memory", "2g")
val sc = new SparkContext(conf)
val rs = sc.parallelize(recs)
val rsGrp = rs.groupBy(r => (r.day, r.kind, r.city)).map(_._2)
val x = rsGrp.map{r =>
val lst = r.toList
lst.map{e => (e.prize, e)}
}
x.sortByKey()
}

}

当我尝试对组进行排序时,出现错误:
value sortByKey is not a member of org.apache.spark.rdd.RDD[List[(Int, 
Sort.Record)]]

怎么了?怎么排序?

最佳答案

您需要先定义一个Key,然后再对mapValues进行排序。

import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext._

object Sort {

case class Record(name:String, day: String, kind: String, city: String, prize:Int)

// Define your data

def main(args: Array[String]): Unit = {
val conf = new SparkConf()
.setAppName("Test")
.setMaster("local")
.set("spark.executor.memory", "2g")
val sc = new SparkContext(conf)
val rs = sc.parallelize(recs)

// Generate pair RDD neccesary to call groupByKey and group it
val key: RDD[((String, String, String), Iterable[Record])] = rs.keyBy(r => (r.day, r.city, r.kind)).groupByKey

// Once grouped you need to sort values of each Key
val values: RDD[((String, String, String), List[Record])] = key.mapValues(iter => iter.toList.sortBy(_.prize))

// Print result
values.collect.foreach(println)
}
}

关于scala - Spark : Sort records in groups?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28543510/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com