gpt4 book ai didi

java - Spark 结构化流处理中遇到的问题

转载 作者:行者123 更新时间:2023-12-01 21:31:42 26 4
gpt4 key购买 nike

我编写了一段代码来读取 csf 文件并使用 Spark Stuctured Stream 在控制台上打印该文件。代码如下 -


import java.util.ArrayList;
import java.util.List;

import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.sql.*;
import org.apache.spark.sql.streaming.StreamingQuery;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.types.StructType;
import com.cybernetix.models.BaseDataModel;

public class ReadCSVJob {

static List<BaseDataModel> bdmList=new ArrayList<BaseDataModel>();

public static void main(String args[]) {

SparkSession spark = SparkSession
.builder()
.config("spark.eventLog.enabled", "false")
.config("spark.driver.memory", "2g")
.config("spark.executor.memory", "2g")
.appName("StructuredStreamingAverage")
.master("local")
.getOrCreate();



StructType userSchema = new StructType();
userSchema.add("name", "string");
userSchema.add("status", "String");
userSchema.add("u_startDate", "String");
userSchema.add("u_lastlogin", "string");
userSchema.add("u_firstName", "string");
userSchema.add("u_lastName", "string");
userSchema.add("u_phone","string");
userSchema.add("u_email", "string")
;

Dataset<Row> dataset = spark.
readStream().
schema(userSchema)
.csv("D:\\user\\sdata\\user-2019-10-03_20.csv");


dataset.writeStream()
.format("console")
.option("truncate","false")
.start();


}

}

在此代码行userSchema.add("name", "string");导致程序终止。下面是日志跟踪。

ANTLR Tool version 4.7 used for code generation does not match the current runtime version 4.5.3ANTLR Runtime version 4.7 used for parser compilation does not match the current runtime version 4.5.3Exception in thread "main" java.lang.ExceptionInInitializerError  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:84)   at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseDataType(ParseDriver.scala:39)   at org.apache.spark.sql.types.StructType.add(StructType.scala:213)  at com.cybernetix.sparks.jobs.ReadCSVJob.main(ReadCSVJob.java:45) Caused by: java.lang.UnsupportedOperationException: java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 (expected aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).   at org.antlr.v4.runtime.atn.ATNDeserializer.deserialize(ATNDeserializer.java:153)   at org.apache.spark.sql.catalyst.parser.SqlBaseLexer.<clinit>(SqlBaseLexer.java:1175)   ... 4 more Caused by: java.io.InvalidClassException: org.antlr.v4.runtime.atn.ATN; Could not deserialize ATN with UUID 59627784-3be5-417a-b9eb-8131a7286089 (expected aadb8d7e-aeef-4415-ad2b-8204d6cf042e or a legacy UUID).   ... 6 more

我已经在 pom.xml 文件中添加了 ANTLR maven 依赖项,但仍然面临同样的问题。

<!-- https://mvnrepository.com/artifact/org.antlr/antlr4 -->
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4</artifactId>
<version>4.7</version>
</dependency>

我不确定添加antlr依赖项后,为什么在maven依赖项列表中仍然是antlr-runtime-4.5.3.jar。看看下面的屏幕截图。

enter image description here

任何人都可以帮我解决我在这里做错的事情吗?

最佳答案

将您的artifactId更新为antlr4-runtime,然后重试。请清理构建

依赖关系应如下所示:

<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr4-runtime</artifactId>
<version>4.7</version>
</dependency>

关于java - Spark 结构化流处理中遇到的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58812586/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com