gpt4 book ai didi

scala - 如何读取压缩的 Spark eventLog?

转载 作者:行者123 更新时间:2023-12-04 04:11:03 26 4
gpt4 key购买 nike

当我尝试读取用 lz4 压缩的 Spark 2.4.4 eventLog 时,我得到一个空的 DataFrame:

cd /opt/spark-2.4.4-bin-hadoop2.7
bin/spark-shell --master=local --conf spark.eventLog.enabled=true --conf spark.eventLog.compress=true --conf spark.io.compression.codec=lz4 --driver-memory 4G --driver-library-path=/opt/hadoop-2.7.1/lib/native/

// Trying to read an event log from a previous session
spark.read.option("compression", "lz4").json(s"file:///tmp/spark-events/local-1589202668377.lz4")

// res0: org.apache.spark.sql.DataFrame = []

然而,当我读取未压缩的事件日志时它工作正常:

bin/spark-shell --master=local --conf spark.eventLog.enabled=true --conf spark.eventLog.compress=false
spark.read.json(s"file:///tmp/spark-events/${sc.applicationId}.inprogress").printSchema

//root
// |-- App ID: string (nullable = true)
// |-- App Name: string (nullable = true)
// |-- Block Manager ID: struct (nullable = true)
// | |-- Executor ID: string (nullable = true)

我还尝试读取一个用 snappy 压缩的 eventLog,结果相同。

最佳答案

尝试做

spark.read.json("dbfs:/tmp/compress/part-00000.lz4")
spark.conf.set("spark.io.compression.codec","org.apache.spark.io.LZ4CompressionCodec")

如果它不起作用,您的 lz4 很可能与 org.apache.hadoop.io.compress.Lz4Codec 不兼容下面是相同的开放问题链接 lz4 incompatibility between OS and Hadoop

关于scala - 如何读取压缩的 Spark eventLog?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61732380/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com