gpt4 book ai didi

scala - Spark SQL的Scala API-TimestampType-找不到用于org.apache.spark.sql.types.TimestampType的编码器

转载 作者:行者123 更新时间:2023-12-04 17:07:19 24 4
gpt4 key购买 nike

我在Databricks笔记本上使用Spark 2.1和Scala 2.11

确切的TimestampType是什么?

SparkSQL's documentation知道,官方的时间戳类型是TimestampType,它显然是java.sql.Timestamp的别名:

可以在SparkSQL的Scala API中找到TimestampType

在使用架构和数据集API 时,我们有所不同

解析{"time":1469501297,"action":"Open"} from the Databricks' Scala Structured Streaming example

使用Json模式-> OK (我更喜欢使用优雅的Dataset API):

val jsonSchema = new StructType().add("time", TimestampType).add("action", StringType)

val staticInputDF =
spark
.read
.schema(jsonSchema)
.json(inputPath)

使用数据集API-> KO :找不到TimestampType的编码器

创建事件类
import org.apache.spark.sql.types._
case class Event(action: String, time: TimestampType)
--> defined class Event

从数据块上的DBFS读取事件时出错。

注意:将 java.sql.Timestamp用作“时间”的类型时,我们不会收到错误消息
val path = "/databricks-datasets/structured-streaming/events/"
val events = spark.read.json(path).as[Event]

错误消息
java.lang.UnsupportedOperationException: No Encoder found for org.apache.spark.sql.types.TimestampType
- field (class: "org.apache.spark.sql.types.TimestampType", name: "time")
- root class:

最佳答案

结合使用模式读取方法.schema(jsonSchema)和包含as[Type]类型的java.sql.Timestamp方法将解决此问题。这个想法是在阅读结构化流文档Creating streaming DataFrames and streaming Datasets之后得出的

These examples generate streaming DataFrames that are untyped, meaning that the schema of the DataFrame is not checked at compile time, only checked at runtime when the query is submitted. Some operations like map, flatMap, etc. need the type to be known at compile time. To do those, you can convert these untyped streaming DataFrames to typed streaming Datasets using the same methods as static DataFrame.


val path = "/databricks-datasets/structured-streaming/events/"

val jsonSchema = new StructType().add("time", TimestampType).add("action", StringType)

case class Event(action: String, time: java.sql.Timestamp)

val staticInputDS =
spark
.read
.schema(jsonSchema)
.json(path)
.as[Event]

staticInputDF.printSchema

将输出:
root
|-- time: timestamp (nullable = true)
|-- action: string (nullable = true)

关于scala - Spark SQL的Scala API-TimestampType-找不到用于org.apache.spark.sql.types.TimestampType的编码器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44316974/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com