gpt4 book ai didi

apache-spark - 如何在 Spark 2.4.4 中使用增量创建表?

转载 作者:行者123 更新时间:2023-12-04 05:09:12 25 4
gpt4 key购买 nike

这是 Spark 2.4.4 和 Delta Lake 0.5.0。

我正在尝试使用 delta 数据源创建一个表,但似乎我遗漏了一些东西。虽然CREATE TABLE USING delta命令工作正常,既没有创建表目录,也没有创建 insertInto作品。

以下CREATE TABLE USING delta工作正常,但 insertInto失败的。

scala> sql("""
create table t5
USING delta
LOCATION '/tmp/delta'
""").show

scala> spark.catalog.listTables.where('name === "t5").show
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
| t5| default| null| EXTERNAL| false|
+----+--------+-----------+---------+-----------+

scala> spark.range(5).write.option("mergeSchema", true).insertInto("t5")
org.apache.spark.sql.AnalysisException: `default`.`t5` requires that the data to be inserted have the same number of columns as the target table: target table has 0 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s).;
at org.apache.spark.sql.execution.datasources.PreprocessTableInsertion.org$apache$spark$sql$execution$datasources$PreprocessTableInsertion$$preprocess(rules.scala:341)
...

我以为我会用定义的列创建,但这也不起作用。
scala> sql("""
create table t6
(id LONG, name STRING)
USING delta
LOCATION '/tmp/delta'
""").show
org.apache.spark.sql.AnalysisException: delta does not allow user-specified schemas.;
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:78)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:194)
at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3370)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3370)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
... 54 elided

最佳答案

Delta 的 OSS 版本目前还没有 SQL Create Table 语法。这将在使用 Spark 3.0 的 future 版本中实现。

要创建 Delta 表,您必须以 Delta 格式写出一个 DataFrame。 Python中的一个例子是

df.write.format("delta").save("/some/data/path")

这是一个 link到 Python、Scala 和 Java 的创建表文档。

关于apache-spark - 如何在 Spark 2.4.4 中使用增量创建表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59546516/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com