gpt4 book ai didi

java.lang.UnsupportedOperationException : Data source org. apache.spark.sql.redis不支持流式读取

转载 作者:行者123 更新时间:2023-12-01 20:27:11 28 4
gpt4 key购买 nike

尝试使用 Spark 2.3(java) 代码从 Redis 读取数据。能够从Redis读取非流式数据,但无法从Redis读取流式数据,出现以下错误:1)当我将格式指定为:

Dataset<Row> RedisData = spark.readStream()
.format("org.apache.spark.sql.redis")
.option("stream.keys","carsstream")
.schema(UserSchema5)
.load();

错误是:

java.lang.UnsupportedOperationException: Data source org.apache.spark.sql.redis does not support streamed reading

2)当我将格式指定为:

Dataset<Row> RedisData = spark.readStream()
.format("redis")
.option("stream.keys","carsstream")
.schema(UserSchema5)
.load();

错误是

java.lang.ClassNotFoundException: Failed to find data source: redis. Please find packages at http://spark.apache.org/third-party-projects.html

我已经指定了Jedis(版本3.1.0)spark-redis(2.3.1)spark-core_2.11(版本2.3)的jar .0)

任何建议都会有帮助。

最佳答案

根据您面临的错误以及您尝试实现的代码,我推断您正在使用Spark Structured Streaming。请参阅以下片段供您引用。另外,我还共享了 GitHub 存储库的链接,您可以在其中找到完整的代码。

您需要创建一个DataFrame/Dataset,而不是Streaming DataFrame/Dataset。因此,你必须这样做:

val keysPattern = s"${topic}:*"

// SCHEMA FOR STATE DATA CACHED IN REDIS DATA
val redisSchema = StructType(
List(
StructField("col1",StringType,true),
StructField("col2",StringType,true),
StructField("col3",StringType,true)
)
)
val redisDf = spark.read
.format("org.apache.spark.sql.redis")
.schema(redisSchema)
.option("keys.pattern", keysPattern)
.load

您可以像这样将这个dataframeStreaming DataFrame连接起来:

// JOIN THE STREAMING DATA WITH REGULAR DATAFRAME
val joinedDf = streamingDf.joinWith(
redisDf,
trim(col("col1")) === trim(col("u_id")),
"left"
).select("_1.*", "_2.*")

常规 DataFrameStreaming DataFrame 的连接会产生 Streaming DataFrame。要将Streaming DataFrame写回到Redis,您还需要实现一个foreach writer。这看起来像这样:

// REDIS CONNECTOR - FOREACHWRITER SINK
val redisForeachWriter : RedisForeachWriter = new RedisForeachWriter("localhost","6379", topic)

// PUSH NEW USER DETAILS TO REDIS FOR STATE REFERENCE
val redisSinkQuery = joinedDf
.select(
"col1", "col2", ... , "coln"
)
.writeStream
.outputMode("update")
.foreach(redisForeachWriter)
.start

示例RedisForeachWriter如下所示:

import org.apache.spark.sql.ForeachWriter
import org.apache.spark.sql.Row
import redis.clients.jedis.Jedis
import scala.collection.mutable.HashMap
import scala.collection.JavaConversions._
import scala.collection.JavaConversions
import org.apache.spark.sql.Dataset

class RedisForeachWriter(val host: String, port: String, val topic: String) extends ForeachWriter[Row]{
// val host: String = p_host
// val port: String = p_port

var jedis: Jedis = _

def connect() = {
jedis = new Jedis(host, port.toInt)
}

override def open(partitionId: Long, version: Long): Boolean = {
return true
}

override def process(record: Row) = {
val u_id = record.getString(1);

if( !(u_id == null || u_id.isEmpty())){
val columns : Array[String] = record.schema.fieldNames

if(jedis == null){
connect()
}

for(i <- 0 until columns.length){
if(! ((record.getString(i) == null) || (record.getString(i).isEmpty()) || record.getString(i) == "") )
jedis.hset(s"${topic}:" + u_id, columns(i), record.getString(i))
}
}
}

override def close(errorOrNull: Throwable) = {

}
}

可以引用我的Github有一个类似的用例供您引用并返回以进行澄清。 https://github.com/krohit-scala/MSStreamingStack

编辑:请在您的应用程序 POM 中添加这些依赖项。

     <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.3.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.3.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.3.0</version>
</dependency>
<dependency>
<groupId>com.redislabs</groupId>
<artifactId>spark-redis</artifactId>
<version>2.3.1</version>
</dependency>

关于java.lang.UnsupportedOperationException : Data source org. apache.spark.sql.redis不支持流式读取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58909914/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com