gpt4 book ai didi

azure - 如何在 Databricks 中的 Iceberg 表上执行 Spark SQL 合并语句?

转载 作者:行者123 更新时间:2023-12-02 23:38:01 24 4
gpt4 key购买 nike

我尝试在 Databricks 环境中设置 Apache Iceberg,但在 Spark SQL 中执行 MERGE 语句时遇到错误。

此代码:

CREATE TABLE iceberg.db.table (id bigint, data string) USING iceberg;

INSERT INTO iceberg.db.table VALUES (1, 'a'), (2, 'b'), (3, 'c');

INSERT INTO iceberg.db.table SELECT id, data FROM (select * from iceberg.db.table) t WHERE length(data) = 1;

MERGE INTO iceberg.db.table t USING (SELECT * FROM iceberg.db.table) u ON t.id = u.id
WHEN NOT MATCHED THEN INSERT *

生成此错误:

Error in SQL statement: AnalysisException: MERGE destination only supports Delta sources.
Some(RelationV2[id#116L, data#117] iceberg.db.table

我认为问题的根源在于 MERGE 也是 Delta Lake SQL 引擎的关键字。据我所知,这个问题源于 Spark 尝试执行计划的顺序。 MERGE 触发增量规则,然后抛出错误,因为它不是增量表。我能够毫无问题地读取、追加和覆盖冰山表。

主要问题:如何让 Spark 将其识别为 Iceberg 查询而不是 Delta?或者是否可以完全删除与增量相关的 SQL 规则?

环境

Spark版本:3.0.1

Databricks 运行时版本:7.6

冰山配置

spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.iceberg.type=hadoop
spark.sql.catalog.iceberg.warehouse=BLOB_STORAGE_CONTAINER

堆栈跟踪:

com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: MERGE destination only supports Delta sources.
Some(RelationV2[id#116L, data#117] iceberg.db.table
);
at com.databricks.sql.transaction.tahoe.DeltaErrors$.notADeltaSourceException(DeltaErrors.scala:343)
at com.databricks.sql.transaction.tahoe.PreprocessTableMerge.apply(PreprocessTableMerge.scala:201)
at com.databricks.sql.transaction.tahoe.PreprocessTableMergeEdge$$anonfun$apply$1.applyOrElse(PreprocessTableMergeEdge.scala:39)
at com.databricks.sql.transaction.tahoe.PreprocessTableMergeEdge$$anonfun$apply$1.applyOrElse(PreprocessTableMergeEdge.scala:36)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:112)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:112)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:216)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:110)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:108)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
at com.databricks.sql.transaction.tahoe.PreprocessTableMergeEdge.apply(PreprocessTableMergeEdge.scala:36)
at com.databricks.sql.transaction.tahoe.PreprocessTableMergeEdge.apply(PreprocessTableMergeEdge.scala:29)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:152)```

最佳答案

我认为这里的错误是 Databricks 总是抢占添加到 Spark session 的其他扩展。这意味着您无法执行 Iceberg 代码路径,并且只会使用 Databricks 扩展。我会询问您的 Databricks 代表是否有办法允许将 Iceberg Extensions 放置在第一位,或者他们是否可以考虑允许其他合并实现。

关于azure - 如何在 Databricks 中的 Iceberg 表上执行 Spark SQL 合并语句?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67893372/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com