gpt4 book ai didi

scala - 在 Spark Scala 中将 List> 转换为 Array

转载 作者:行者123 更新时间:2023-12-02 04:00:36 27 4
gpt4 key购买 nike

我想转换代表 List<List<Long,Float,Float,Integer,Integer>> 的字符串在一个数组中。为了实现这一点,我使用了具有以下结构的 UDF 函数:

字符串的示例是 [[337, -115.0, -17.5, 6225, 189],[85075, -112.0, -12.5, 6225, 359]]

    def convertToListOfListComplex(ListOfList: String, regex: String): Array[StructType]
={
val notBracket = ListOfList.dropRight(1).drop(1)
val SplitString = notBracket.split("]").map(x=>if (x.startsWith("[")) x.drop(1) else x.drop(2))
SplitString(0).replaceAll("\\s", "")

val result =SplitString map {
case s => {
val split = s.replaceAll("\\s", "").trim.split(",")
case class Row(a: Long, b: Float, c: Float, d: Int, e: Int)
val element = Row(split(0).toLong, split(1).toFloat, split(2).toFloat, split(3).toInt, split(4).toInt)
val schema = `valid code to transform to case class to StructType`
}
}
return result
}

我使用的是 Spark 2.2。我尝试了不同的解决方案,但发现获取 StructTypes 数组时出现问题,出现编译错误或执行失败。有什么建议吗??

最佳答案

出于测试目的,我创建了一个测试数据框,其中问题中提到的字符串为

val df = Seq(
Tuple1("[[337, -115.0, -17.5, 6225, 189],[85075, -112.0, -12.5, 6225, 359]]")
).toDF("col")

这是

+-------------------------------------------------------------------+
|col |
+-------------------------------------------------------------------+
|[[337, -115.0, -17.5, 6225, 189],[85075, -112.0, -12.5, 6225, 359]]|
+-------------------------------------------------------------------+

root
|-- col: string (nullable = true)

udf 函数应如下所示

import org.apache.spark.sql.functions._
def convertToListOfListComplex = udf((ListOfList: String) => {
ListOfList.split("],\\[")
.map(x => x.replaceAll("[\\]\\[]", "").split(","))
.map(splitted => rowTest(splitted(0).trim.toLong, splitted(1).trim.toFloat, splitted(2).trim.toFloat, splitted(3).trim.toInt, splitted(4).trim.toInt))
})

其中rowTest是一个case类在范围之外定义

case class rowTest(a: Long, b: Float, c: Float, d: Int, e: Int)

并调用udf函数

df.withColumn("converted", convertToListOfListComplex(col("col")))

应该给你输出

+-------------------------------------------------------------------+--------------------------------------------------------------------+
|col |converted |
+-------------------------------------------------------------------+--------------------------------------------------------------------+
|[[337, -115.0, -17.5, 6225, 189],[85075, -112.0, -12.5, 6225, 359]]|[[337, -115.0, -17.5, 6225, 189], [85075, -112.0, -12.5, 6225, 359]]|
+-------------------------------------------------------------------+--------------------------------------------------------------------+


root
|-- col: string (nullable = true)
|-- converted: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- a: long (nullable = false)
| | |-- b: float (nullable = false)
| | |-- c: float (nullable = false)
| | |-- d: integer (nullable = false)
| | |-- e: integer (nullable = false)

为了更安全,您可以在udf函数中使用Try/getOrElse作为

import org.apache.spark.sql.functions._
def convertToListOfListComplex = udf((ListOfList: String) => {
ListOfList.split("],\\[")
.map(x => x.replaceAll("[\\]\\[]", "").split(","))
.map(splitted => rowTest(Try(splitted(0).trim.toLong).getOrElse(0L), Try(splitted(1).trim.toFloat).getOrElse(0F), Try(splitted(2).trim.toFloat).getOrElse(0F), Try(splitted(3).trim.toInt).getOrElse(0), Try(splitted(4).trim.toInt).getOrElse(0)))
})

希望我的回答对您有帮助

关于scala - 在 Spark Scala 中将 List<List<Long,Float,Float,Integer,Integer>> 转换为 Array<StructType>,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51607225/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com