gpt4 book ai didi

scala - 在数据帧上执行 groupby 时连接 maptype 值

转载 作者:行者123 更新时间:2023-12-01 03:27:47 25 4
gpt4 key购买 nike

我有这个包含 3 列的数据框 -> userId, date, generation

+-------+--------+----------------------------------------------------------------------------+
|userId | date |generation |
+-------+--------+----------------------------------------------------------------------------+
|1 |20160926|Map("screen_WiFi" -> 15.127, "upload_WiFi" -> 0.603, "total_WiFi" -> 19.551)|
|1 |20160926|Map("screen_2g" -> 0.573, "upload_2g" -> 0.466, "total_2g" -> 1.419) |
|1 |20160926|Map("screen_3g" -> 10.084, "upload_3g" -> 80.515, "total_3g" -> 175.435) |
+-------+--------+----------------------------------------------------------------------------+

我想根据 对这些值进行分组用户 ID 日期
但问题在于包含 maptype 值的第三列,并且要求将所有 maptype 值组合在一列中,最终输出应如下所示->
+-------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|userId |date |generation |
+-------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1 |20160926|Map("screen_WiFi" -> 15.127, "upload_WiFi" -> 0.603, "total_WiFi" -> 19.551,"screen_2g" -> 0.573, "upload_2g" -> 0.466, "total_2g" -> 1.419, "screen_3g" -> 10.084, "upload_3g" -> 80.515, "total_3g" -> 175.435)|
+-------+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

有没有办法解决这个问题,或者任何可能的解决方法?

最佳答案

您可以创建一个简单的用户定义聚合函数 (UDAF) 来组合 map ,然后将其用作聚合函数。由于您没有定义如何在映射中为两个相同的键组合两个值,我将假设键是唯一的,即对于每个 userIddate ,没有键会出现在两个不同的记录中:

/***
* UDAF combining maps, overriding any duplicate key with "latest" value
* @param keyType DataType of Map key
* @param valueType DataType of Value key
* @tparam K key type
* @tparam V value type
*/
class CombineMaps[K, V](keyType: DataType, valueType: DataType) extends UserDefinedAggregateFunction {
override def inputSchema: StructType = new StructType().add("map", dataType)
override def bufferSchema: StructType = inputSchema
override def dataType: DataType = MapType(keyType, valueType)
override def deterministic: Boolean = true

override def initialize(buffer: MutableAggregationBuffer): Unit = buffer.update(0 , Map[K, V]())

// naive implementation - assuming keys won't repeat, otherwise later value for key overrides earlier one
override def update(buffer: MutableAggregationBuffer, input: Row): Unit = {
val before = buffer.getAs[Map[K, V]](0)
val toAdd = input.getAs[Map[K, V]](0)
val result = before ++ toAdd
buffer.update(0, result)
}

override def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = update(buffer1, buffer2)

override def evaluate(buffer: Row): Any = buffer.getAs[Map[String, Int]](0)
}

// instantiate a CombineMaps with the relevant types:
val combineMaps = new CombineMaps[String, Double](StringType, DoubleType)

// groupBy and aggregate
val result = input.groupBy("userId", "date").agg(combineMaps(col("generation")))

关于scala - 在数据帧上执行 groupby 时连接 maptype 值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40078900/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com