java - 从数据帧 'java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark expected: hdfs://nameservice1' 创建 Hive 表时出错-6ren

java - 从数据帧 'java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark expected: hdfs://nameservice1' 创建 Hive 表时出错

转载作者：可可西里更新时间：2023-11-01 16:39:13

24

4

我是 spark 的新手。我正在尝试开发一个使用 Spark 1.6 将 json 数据保存到 Hive 表的应用程序。这是我的代码:

 val rdd = sc.parallelize(Seq(arr.toString)) //arr is the Json array
 val dataframe = hiveContext.read.json(rdd)
 dataframe.registerTempTable("RiskRecon_tmp")
 hiveContext.sql("DROP TABLE IF EXISTS RiskRecon_TOES")
 hiveContext.sql("CREATE TABLE RiskRecon_TOES as select * from RiskRecon_tmp")

当我运行它时，出现以下错误:

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark-2c2e53f5-6b5f-462a-afa2-53b8cf5e53f1/scratch_hive_2017-07-12_07-41-07_146_1120449530614050587-1, expected: hdfs://nameservice1
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:660)
        at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:480)
        at org.apache.hadoop.hive.ql.Context.getStagingDir(Context.java:229)
        at org.apache.hadoop.hive.ql.Context.getExternalScratchDir(Context.java:359)
        at org.apache.hadoop.hive.ql.Context.getExternalTmpPath(Context.java:437)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:132)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:276)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.hive.execution.CreateTableAsSelect.run(CreateTableAsSelect.scala:89)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
        at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
        at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
        at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
        at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
        at test$.main(test.scala:25)
        at test.main(test.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

它给了我 create table 语句的错误。

这个错误是什么意思？我这样做是正确的方式还是有更好的方法将数据框保存到表中？另外，如果这段代码有效，创建的表将是一个内部表？理想情况下，我需要一个外部表来存储我的数据。

如有任何帮助，我们将不胜感激。谢谢。

最佳答案

假设 df 包含存储为 dataframe 的 JSON 文件的数据:

val df = sqlContext.read.json(rdd)

然后您可以使用 saveAsTable 将其加载到您的配置单元表中。请注意，您要加载到的配置单元表应该已经存在于所需位置，因此您可以根据需要创建一个 EXTERNAL 表。并且您的 spark 用户有权将数据写入相应的文件夹。

df.write.mode("append").saveAsTable("database.table_name")

根据您的要求，您可以使用其他几种可用的写入模式，如 append、overwrite 等。

关于java - 从数据帧 'java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark expected: hdfs://nameservice1' 创建 Hive 表时出错，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45058246/

24

4

0

文章推荐： c++ - 是否有其他类型的 void_t 的标准概括？

文章推荐： c++ - 为什么删除默认参数会破坏此 constexpr 计数器？

file - access to file to files tomcat的conf文件夹下的一个文件
我想知道是否可以访问放在 tomcat 的 conf 文件夹中的文件。通常我会在这个文件中放置多个 webapp 的配置，在 war 之外。我想使用类路径独立于文件系统。我过去使用过 lib 文件
PowerShell ForEach $file in $Files 中的每个 $file
我有一个 PowerShell 脚本，它获取文件列表并移动满足特定条件的文件。为什么即使对象为空，foreach 循环也会运行？我假设如果 $i 不存在，它就不会运行。但是如果 $filePath
java - File file = new File () 的路径错误
我已将 BasicAccountRule.drl 放置在我的 Web 应用程序中，位置为:C:/workspace/exim_design/src/main/resources/rules/drl/i
ruby - File.open ('file.txt' ) 与 File.open ('file.txt' ).readlines
我使用 File.open('file.txt').class 和 File.open('file.txt').readlines.class 以及前者进行了检查一个返回 File，后者返回 Arra
java - 即使 file.exists()、file.canRead()、file.canWrite()、file.canExecute() 都返回 true，file.delete() 也会返回 false
我正在尝试使用 FileOutputStream 删除文件，在其中写入内容后。这是我用来编写的代码: private void writeContent(File file, String fileC
python - FileNotFoundException :File file:/path/to/file/in. txt不存在或者运行Flink的用户没有足够的权限访问它
我正在尝试使用 flink 和 python 批处理 api 测试 Wordcount 经典示例。我的问题是，将数据源从 env.from_elements() 修改为 env.read_text()
c - 通过函数 : FILE* or FILE**? 的 FILE* 数组
我正在尝试制作一个可以同时处理多个不同文件的程序。我的想法是制作一个包含 20 个 FILE* 的数组，以便在我达到此限制时能够关闭其中一个并打开请求的新文件。为此，我想到了一个函数，它选择一个选项
linux - 狂欢 : Search Contents of File A in File B and Print lines of File A in File C
我有两个文件A和B文件A: 976464 792992 文件B TimeStamp,Record1,976464,8383,ABCD 我想搜索文件 A 和文件 B 中的每条记录并打印匹配的记录。打印的
java - 使用 Java 8 流将 Map 转换为 Map>
我有一些保存在 map 中的属性文件。示例: Map map = new HashMap<>(); map.put("1", "One"); map.put("2", "Two"); map.put(
file - Unix/庆典 : Reading A List of Files and Merge Them To A File
我正在尝试找出一个脚本文件，该文件接受一个包含文件列表的文件(每一行都是一个文件路径，即 path/to/file)并将它们合并到一个文件中。例如: list.text -- path/to/fil
c# - File.CreateText/File.AppendText 与 File.AppendAllText
为了使用 File.CreateText() 和 File.AppendText() 你必须: 通过调用这些方法之一打开流写消息关闭流处理流为了使用 File.AppendAllText()
Using rsync to rename files during copying with --files-from?(在复制过程中使用rsync重命名文件--files-from？)
使用rsync时，如何在使用--files-from参数复制时重命名文件？我有大约190，000个文件，在从源复制到目标时，每个文件都需要重命名。我计划将文件列表放在一个文本文件中传递给--files
java - "file:d:\\dir1\file.xml"和 "file:/d:\\dir1\file.xml"作为 FileSystemXmlApplicationContext 参数
我在非服务器应用程序中使用 Spring(只需从 Eclipse 中某个类的 main() 编译并运行它)。我的问题是作为 new FileSystemXmlApplicationContext 的
ksh - "test -a file"和 "test file -ef file"的区别
QNX (Neutrino 6.5.0) 使用 ksh 的开源实现作为其 shell 。许多提供的脚本，包括系统启动脚本，都使用诸如 if ! test /dev/slog -ef /dev/slog
PHP : Excel cannot open the file because the file format or file extension is not valid
当我尝试打开从我的应用程序下载的 xls 文件时，出现此错误: excel cannot open the file because the file format or file extension
c - "file pointer"、 "stream"、 "file descriptor"和... "file"之间的区别？
有一些相关的概念，即文件指针、流和文件描述符。我知道文件指针是指向数据类型 FILE 的指针(在例如 FILE.h 和 struct_FILE.h 中声明)。我知道文件描述符是 int ，例如成员
file - Groovy(文件IO): find all files and return all files - the Groovy way
好吧，这应该很容易... 我是groovy的新手，我希望实现以下逻辑: def testFiles = findAllTestFiles(); 到目前为止，我想出了下面的代码，该代码可以成功打印所有文
PowerShell:为什么 "Get-Content | Out-File -Append "会进入循环？
我理解为什么以下内容会截断文件的内容: Get-Content | Out-File 这是因为 Out-File 首先运行，它会在 Get-Content 有机会读取文件之前清空文件。但是当我尝
file - 类型错误 : invalid file: When trying to make a file name a variable
您好，我正在尝试将文件位置表示为变量，因为最终脚本将在另一台机器上运行。这是我尝试过的代码，然后是我得到的错误。在我看来，python 是如何添加“\”的，这就是导致问题的原因。如果是这种情况，我如何
bash - 一行文件的 "$(cat file)"、 "$(
我有一个只包含一行的输入文件: $ cat input foo bar 我想在我的脚本中使用这一行，据我所知有 3 种方法: line=$(cat input) line=$( input"...,

首页

博学

6Ren·AI

商城

java - 从数据帧 'java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark expected: hdfs://nameservice1' 创建 Hive 表时出错