gpt4 book ai didi

apache-spark - 在结构化流中找不到连续触发器

转载 作者:行者123 更新时间:2023-12-03 23:15:49 27 4
gpt4 key购买 nike

运行时:Spark 2.3.0、Scala 2.11(Databricks 4.1 ML beta)



import org.apache.spark.sql.streaming.Trigger
import scala.concurrent.duration._

//kafka settings and df definition goes here

val query = df.writeStream.format("parquet")
.option("path", ...)
.option("checkpointLocation",...)
.trigger(continuous(30000))
.outputMode(OutputMode.Append)
.start

未找到引发错误:值连续

其他无效的尝试:

.trigger(continuous = "30 seconds") //as per Databricks blog
// throws same error as above

.trigger(Trigger.Continuous("1 second")) //as per Spark docs
// throws java.lang.IllegalStateException: Unknown type of trigger: ContinuousTrigger(1000)

引用:

(Databricks 博客)
https://databricks.com/blog/2018/03/20/low-latency-continuous-processing-mode-in-structured-streaming-in-apache-spark-2-3-0.html

( Spark 指南)
http://spark.apache.org/docs/2.3.0/structured-streaming-programming-guide.html#continuous-processing

(Scaladoc) https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.sql.streaming.package

最佳答案

Spark 2.3.0 不支持连续流下的 parquet,您必须使用基于 Kafka 的流、控制台或内存。

引用 continuous processing mode in structured streaming博文:

You can set the optional Continuous Trigger in queries that satisfy the following conditions: Read from supported sources like Kafka and write to supported sinks like Kafka, memory, console.

关于apache-spark - 在结构化流中找不到连续触发器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50952042/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com