gpt4 book ai didi

java - 从数据帧 'java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark expected: hdfs://nameservice1' 创建 Hive 表时出错

转载 作者:可可西里 更新时间:2023-11-01 16:39:13 24 4
gpt4 key购买 nike

我是 spark 的新手。我正在尝试开发一个使用 Spark 1.6 将 json 数据保存到 Hive 表的应用程序。这是我的代码:

 val rdd = sc.parallelize(Seq(arr.toString)) //arr is the Json array
val dataframe = hiveContext.read.json(rdd)
dataframe.registerTempTable("RiskRecon_tmp")
hiveContext.sql("DROP TABLE IF EXISTS RiskRecon_TOES")
hiveContext.sql("CREATE TABLE RiskRecon_TOES as select * from RiskRecon_tmp")

当我运行它时,出现以下错误:

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark-2c2e53f5-6b5f-462a-afa2-53b8cf5e53f1/scratch_hive_2017-07-12_07-41-07_146_1120449530614050587-1, expected: hdfs://nameservice1
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:660)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:480)
at org.apache.hadoop.hive.ql.Context.getStagingDir(Context.java:229)
at org.apache.hadoop.hive.ql.Context.getExternalScratchDir(Context.java:359)
at org.apache.hadoop.hive.ql.Context.getExternalTmpPath(Context.java:437)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:132)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.hive.execution.CreateTableAsSelect.run(CreateTableAsSelect.scala:89)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at test$.main(test.scala:25)
at test.main(test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

它给了我 create table 语句的错误。

这个错误是什么意思?我这样做是正确的方式还是有更好的方法将数据框保存到表中?另外,如果这段代码有效,创建的表将是一个内部表?理想情况下,我需要一个外部表来存储我的数据。

如有任何帮助,我们将不胜感激。谢谢。

最佳答案

假设 df 包含存储为 dataframe 的 JSON 文件的数据:

val df = sqlContext.read.json(rdd)

然后您可以使用 saveAsTable 将其加载到您的配置单元表中。请注意,您要加载到的配置单元表应该已经存在于所需位置,因此您可以根据需要创建一个 EXTERNAL 表。并且您的 spark 用户有权将数据写入相应的文件夹。

df.write.mode("append").saveAsTable("database.table_name")

根据您的要求,您可以使用其他几种可用的写入模式,如 appendoverwrite 等。

关于java - 从数据帧 'java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark expected: hdfs://nameservice1' 创建 Hive 表时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45058246/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com