gpt4 book ai didi

pyspark - 胶水 AWS : error occurred while calling o60. getDynamicFrame

转载 作者:行者123 更新时间:2023-12-02 14:02:48 31 4
gpt4 key购买 nike

我已经定义了一个基本脚本来使用来自 redshift 中的一个表的数据创建 DF。我运行了该过程,但我已经为一条无法解释的消息而苦苦挣扎了一段时间。

日志中输出的错误为:

“/mnt/yarn/usercache/root/appcache/application_1525803778049_0004/container_1525803778049_0004_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py”,第 319 行,在 get_return_value py4j 中。协议(protocol).Py4JJava错误:调用 o60.getDynamicFrame 时出错。 :java.lang.UnsupportedOperationException:scala.collection处的empty.reduceLeft。

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame, DynamicFrameReader, DynamicFrameWriter, DynamicFrameCollection
from pyspark.sql.functions import lit
from awsglue.job import Job

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

table = glueContext.create_dynamic_frame.from_options(connection_type="redshift", connection_options =
{"url": "jdbc:redshift://xxxxx.yyyyy.us-east-1.redshift.amazonaws.com:5439/db",
"user": "yyyy",
"password": "yyyyy",
"dbtable": "schema.table_name",
"redshiftTmpDir": "s3://aws-glue-temporary-accountnumber-us-east-1/"},
format="orc",
transformation_ctx="table" )

table.show()

dfred = table.toDF().createOrReplaceTempView("table_df")

job.commit()

感谢您为我提供的任何帮助。非常感谢

最佳答案

好吧,在继续努力解决这个问题之后,我浏览了 DynamicFrame 的官方代码类。因此,我在代码中添加了一个应用格式转换来映射来自 redshift 中读取表的结果,以及提取表的方法,我跳过了失败的参数 transformation_ctx错误o60

我的最终版本代码是:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame, DynamicFrameReader, DynamicFrameWriter, DynamicFrameCollection
from pyspark.sql.functions import lit
from awsglue.job import Job

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

table = glueContext.create_dynamic_frame.from_options(connection_type="redshift", connection_options =
{"url": "jdbc:redshift://xxxxx.yyyyy.us-east-1.redshift.amazonaws.com:5439/db",
"user": "yyyy",
"password": "yyyyy",
"dbtable": "schema.table_name",
"redshiftTmpDir": "s3://aws-glue-temporary-accountnumber-us-east-1/"}
)

applyformat = ApplyMapping.apply(frame =table, mappings =
[("field1","string","field1","string"),
("field2","string","field2","string") ], transformation_ctx = "applyformat")


dfred = table.toDF().createOrReplaceTempView("table_df")

sqlDF = spark.sql(
"SELECT COUNT(*) FROM table_df"
)


print sqlDF.show()

job.commit()

关于pyspark - 胶水 AWS : error occurred while calling o60. getDynamicFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50240834/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com