gpt4 book ai didi

scala - 如何在字符串类型的嵌套结构中获取spark中的所有列名称

转载 作者:行者123 更新时间:2023-12-05 02:42:42 27 4
gpt4 key购买 nike

所以我有这个模式,我想获取所有包含字符串的列名

  val schema = StructType(Array(
StructField("id", IntegerType),
StructField("nested", StructType(Array(
StructField("value1", StringType),
StructField("value2", StringType)
))),
StructField("name", StringType)
))

我想得到

  Seq("nested.value1","nested.value2","name")

这只是一个示例,但它应该可以工作并且可以嵌套多个级别

最佳答案

def extractNames(schema: StructType): Seq[String] = {
schema.fields.flatMap {
field =>
field.dataType match {
case structType: StructType =>
extractNames(structType).map(field.name + "." + _)
case _: StringType =>
field.name :: Nil
case _ =>
Nil
}
}
}

测试用例:

val schema = StructType(Array(
StructField("id", IntegerType),
StructField("nested", StructType(Array(
StructField("value1", StringType),
StructField("value2", StringType),
StructField("struct", StructType(Array(
StructField("value1", IntegerType),
StructField("value2", StringType)
)))
))),
StructField("name", StringType)
))

val names = extractNames(schema)
println(names.mkString(", "))

输出:

nested.value1, nested.value2, nested.struct.value2, name

关于scala - 如何在字符串类型的嵌套结构中获取spark中的所有列名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67338334/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com