gpt4 book ai didi

apache-spark - 导入 Pyspark Delta Lake 模块时找不到模块错误

转载 作者:行者123 更新时间:2023-12-01 23:58:04 25 4
gpt4 key购买 nike

我正在使用 delta lake 运行 Pyspark,但是当我尝试导入 delta 模块时,我得到一个 ModuleNotFoundError: No module named 'delta' .这是在一台没有互联网连接的机器上,所以我不得不从 Maven 手动下载 delta-core jar。并将其放入 %SPARK_HOME%/jars文件夹。

我的程序运行没有任何问题,我能够从 delta lake 写入和读取,所以我很高兴我得到了正确的 jar。但是当我尝试导入增量模块时 from delta.tables import *我收到错误。

有关信息,我的代码是:

import os
from pyspark.sql import SparkSession
from pyspark.sql.types import TimestampType, FloatType, StructType, StructField
from pyspark.sql.functions import input_file_name
from Constants import Constants

if __name__ == "__main__":
constants = Constants()
spark = SparkSession.builder.master("local[*]")\
.appName("Delta Lake Testing")\
.getOrCreate()

# have to start spark session before importing: https://docs.delta.io/latest/quick-start.html#python
from delta.tables import *

# set logging level to limit output
spark.sparkContext.setLogLevel("ERROR")

spark.conf.set("spark.sql.session.timeZone", "UTC")
# push additional python files to the worker nodes
base_path = os.path.abspath(os.path.dirname(__file__))
spark.sparkContext.addPyFile(os.path.join(base_path, 'Constants.py'))

# start pipeline
schema = StructType([StructField("Timestamp", TimestampType(), False),\
StructField("ParamOne", FloatType(), False),\
StructField("ParamTwo", FloatType(), False),\
StructField("ParamThree", FloatType(), False)])

df = spark.readStream\
.option("header", "true")\
.option("timestampFormat", "yyyy-MM-dd HH:mm:ss")\
.schema(schema)\
.csv(constants.input_path)\
.withColumn("input_file_name", input_file_name())

df.writeStream\
.format("delta")\
.outputMode("append")\
.option("checkpointLocation", constants.checkpoint_location)\
.start("/tmp/bronze")

# await on stream
sqm = spark.streams
sqm.awaitAnyTermination()

这是使用 Spark v2.4.4 和 Python v3.6.1,作业是使用 spark-submit path/to/job.py 提交的

最佳答案

%pyspark
sc.addPyFile("**LOCATION_OF_DELTA_LAKE_JAR_FILE**")
from delta.tables import *

关于apache-spark - 导入 Pyspark Delta Lake 模块时找不到模块错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62326402/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com