gpt4 book ai didi

apache-spark - Spark : create a nested schema

转载 作者:行者123 更新时间:2023-12-05 02:11:42 24 4
gpt4 key购买 nike

随着 Spark ,

import spark.implicits._
val data = Seq(
(1, ("value11", "value12")),
(2, ("value21", "value22")),
(3, ("value31", "value32"))
)

val df = data.toDF("id", "v1")
df.printSchema()

结果如下:

root
|-- id: integer (nullable = false)
|-- v1: struct (nullable = true)
| |-- _1: string (nullable = true)
| |-- _2: string (nullable = true)

现在如果我想自己创建schema,应该怎么处理?

val schema = StructType(Array(
StructField("id", IntegerType),
StructField("nested", ???)
))

谢谢。

最佳答案

根据这里的例子: https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/types/StructType.html

 import org.apache.spark.sql._
import org.apache.spark.sql.types._

val innerStruct =
StructType(
StructField("f1", IntegerType, true) ::
StructField("f2", LongType, false) ::
StructField("f3", BooleanType, false) :: Nil)

val struct = StructType(
StructField("a", innerStruct, true) :: Nil)

// Create a Row with the schema defined by struct
val row = Row(Row(1, 2, true))

在您的情况下,它将是:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val schema = StructType(Array(
StructField("id", IntegerType),
StructField("nested", StructType(Array(
StructField("value1", StringType),
StructField("value2", StringType)
)))
))

输出:

StructType(
StructField(id,IntegerType,true),
StructField(nested,StructType(
StructField(value1,StringType,true),
StructField(value2,StringType,true)
),true)
)

关于apache-spark - Spark : create a nested schema,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57079343/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com