gpt4 book ai didi

azure - 无法对损坏的 Lake 数据库表执行任何操作

转载 作者:行者123 更新时间:2023-12-03 03:26:56 25 4
gpt4 key购买 nike

我在 Azure Synapse 中有一个名为 api_endpoints 的损坏的 Lake 数据库表。这是一个外部表,我已经从数据湖中删除了基础数据。无论我尝试什么,我都无法删除该表。我假设 Hive Metastore 中有一段元数据 - 当我查询数据库时:

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES order by TABLE_NAME

损坏的表名仍然存在。我尝试通过尝试以下操作来摆脱该表:

  • 只需删除表:spark.sql("DROP TABLE IF EXISTS api_endpoints")

  • 创建另一个数据库,并尝试将损坏的表移动到那里:

    创建数据库垃圾;

    使用 raw_UtilityDB;

    更改表 api_endpoints 重命名为 Junk.MyMovedCorruptTable;

    删除数据库垃圾级联;

  • 尝试使用 pyspark 覆盖表:

df = Spark.read.csv('random_csv_file.csv')

df.write.saveAsTable("raw_UtilityDB.api_endpoints", mode='overwrite')

我尝试的每个操作都会导致完全相同的错误:

Error: Cannot recognize hive type string: varchar, column: endpoint_name, db: raw_utilitydb, table: api_endpoints
org.apache.spark.sql.errors.QueryExecutionErrors$.convertHiveTableToCatalogTableError(QueryExecutionErrors.scala:1285)
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree2$1(HiveClientImpl.scala:446)
org.apache.spark.sql.hive.client.HiveClientImpl.convertHiveTableToCatalogTable(HiveClientImpl.scala:441)
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:435)
scala.Option.map(Option.scala:230)
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:435)
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:306)
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:236)
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:235)
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:285)
org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:433)
org.apache.spark.sql.hive.client.HiveClient.getTable(HiveClient.scala:90)
org.apache.spark.sql.hive.client.HiveClient.getTable$(HiveClient.scala:89)
org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:91)
org.apache.spark.sql.hive.HiveExternalCatalog.getRawTable(HiveExternalCatalog.scala:123)
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$getTable$1(HiveExternalCatalog.scala:722)
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:722)
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.getTable(ExternalCatalogWithListener.scala:138)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:574)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:559)
org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:65)
org.apache.spark.sql.connector.catalog.DelegatingCatalogExtension.loadTable(DelegatingCatalogExtension.java:68)
org.apache.spark.sql.delta.catalog.DeltaCatalog.loadTable(DeltaCatalog.scala:173)
org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:281)
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableOrView(Analyzer.scala:1300)
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$15.applyOrElse(Analyzer.scala:1294)
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$15.applyOrElse(Analyzer.scala:1237)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1130)
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1129)
org.apache.spark.sql.catalyst.plans.logical.DropTable.mapChildren(v2Commands.scala:532)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:135)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1237)
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1203)
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
scala.collection.immutable.List.foldLeft(List.scala:91)
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
scala.collection.immutable.List.foreach(List.scala:431)
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:222)
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:218)
org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:167)
org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:218)
org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:182)
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:93)
org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:203)
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:202)
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:78)
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:120)
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:207)
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:207)
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:78)
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:76)
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:68)
org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
org.apache.livy.repl.SQLInterpreter.execute(SQLInterpreter.scala:129)
org.apache.livy.repl.Session.$anonfun$executeCode$1(Session.scala:680)
scala.Option.map(Option.scala:230)
org.apache.livy.repl.Session.executeCode(Session.scala:677)
org.apache.livy.repl.Session.$anonfun$execute$4(Session.scala:483)
org.apache.livy.repl.Session.withRealtimeOutputSupport(Session.scala:866)
org.apache.livy.repl.Session.$anonfun$execute$1(Session.scala:483)
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
scala.util.Success.$anonfun$map$1(Try.scala:255)
scala.util.Success.map(Try.scala:213)
scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:750)

如何摆脱 table ?

最佳答案

我也遇到过类似的问题。作为最后一个选项,您可以直接连接到 Hive 元数据数据库。不是通过 Spark 而是通过使用底层数据库连接。这将帮助您永远摆脱 table !

为此,您需要;

1- 使用 Hive 或 Spark,查询数据位置以便稍后将其删除(只需记下,数据删除将是最后一步)。此命令将打印出表详细信息以及表数据位置。

SHOW CREATE TABLE table_name;

2- 连接到元存储数据库。它可以是任何流行的数据库,如 MySQL、PostgreSQL 等。

You can find Azure Synapse metastore details in their documentation. Here is a document for external storages

3-通过以下命令删除所有分区、列和其他表相关数据:

-- Find `table_id` via below command
select TBL_ID as table_id from TBLS where tbl_name = 'table_name';

-- Remove all related data. ORDER IS IMPORTANT for foreign keys!
delete from TABLE_PARAMS where TBL_ID=table_id;
delete from TBL_COL_PRIVS where TBL_ID=table_id;
delete from TBL_PRIVS where TBL_ID=table_id;
delete from PARTITION_KEY_VALS where PART_ID in (select PART_ID from PARTITIONS where TBL_ID=table_id);
delete from PARTITION_PARAMS where part_id in (select PART_ID from PARTITIONS where TBL_ID=table_id);
delete from PART_COL_STATS where PART_ID in (select PART_ID from PARTITIONS where TBL_ID=table_id);
delete from PARTITIONS where TBL_ID=table_id;
delete from PARTITION_KEYS where tbl_id = table_id;
delete from TBLS where TBL_ID=table_id;

4-最后从存储中删除表数据。

希望这会有所帮助!

关于azure - 无法对损坏的 Lake 数据库表执行任何操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75457047/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com