gpt4 book ai didi

pyspark - 如何在 AWS Glue 中指定连接类型?

转载 作者:行者123 更新时间:2023-12-03 16:04:06 25 4
gpt4 key购买 nike

我正在使用 AWS Glue 连接两个表。默认情况下,它执行 INNER JOIN。我想做一个左外连接。我引用了 AWS Glue 文档,但无法将联接类型传递给 Join.apply() 方法。有没有办法在 AWS Glue 中实现这一点?

## @type: Join
## @args: [keys1 = id, keys2 = "user_id"]
## @return: cUser
## @inputs: [frame1 = cUser0, frame2 = cUserLogins]
#cUser = Join.apply(frame1 = cUser0, frame2 = +, keys1 = "id", keys2 = "user_id", transformation_ctx = "<transformation_ctx>")


## @type: Join
## @args: [keys1 = id, keys2 = user_id]
## @return: datasource0
## @inputs: [frame1 = cUser, frame2 = cKKR]
datasource0 = Join.apply(frame1 = cUser0, frame2 = cKKR, keys1 = "id", keys2 = "user_id", transformation_ctx = "<transformation_ctx>")

## @type: Join
## @args: [keys1 = branch_id, keys2 = user_id]
## @return: datasource1
## @inputs: [frame1 = datasource0, frame2 = cBranch]
datasource1 = Join.apply(frame1 = datasource0, frame2 = cBranch, keys1 = "branch_id", keys2 = "user_id", transformation_ctx = "<transformation_ctx>")

最佳答案

目前,AWS Glue 不支持 LEFT 和 RIGHT 联接。但是,我们仍然可以通过将 DynamicFrame 转换为 DataFrame 并使用 join 方法来实现它。

这里的例子:

cUser0 = glueContext.create_dynamic_frame.from_catalog(database = "captains", table_name = "cp_txn_winds_karyakarta_users", transformation_ctx = "cUser")

cUser0DF = cUser0.toDF()

cKKR = glueContext.create_dynamic_frame.from_catalog(database = "captains", table_name = "cp_txn_winds_karyakarta_karyakartas", redshift_tmp_dir = args["TempDir"], transformation_ctx = "cKKR")

cKKRDF = cKKR.toDF()

dataSource0 = cUser0DF.join(cKKRDF, cUser0DF.id == cKKRDF.user_id,how='left_outer')

关于pyspark - 如何在 AWS Glue 中指定连接类型?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54291775/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com