gpt4 book ai didi

amazon-s3 - 无法在 aws EMR 集群中使用 hive 创建外部表,其中位置指向某个 S3 位置

转载 作者:行者123 更新时间:2023-12-03 16:51:11 28 4
gpt4 key购买 nike

我正在尝试使用 AWS EMR 集群的 hive 服务创建一个外部表。在这里,此外部表指向某个 S3 位置。下面是我的创建表定义:

EXTERNAL TABLE if not exists Myschema.MyTable
(
columnA INT,
columnB INT,
columnC String,
)
partitioned BY ( columnD INT )
STORED AS PARQUET
LOCATION 's3://{bucket-locaiton}/{key-path}/';

以下是我得到的异常:
2019-04-11T14:44:59,449 INFO  [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: util.PlatformInfo (PlatformInfo.java:getJobFlowId(54)) - Unable to read clusterId from http://localhost:8321/configuration, trying extra instance data file: /var/lib/instance-controller/extraInstanceData.json
2019-04-11T14:44:59,450 INFO [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: util.PlatformInfo (PlatformInfo.java:getJobFlowId(61)) - Unable to read clusterId from /var/lib/instance-controller/extraInstanceData.json, trying EMR job-flow data file: /var/lib/info/job-flow.json
2019-04-11T14:44:59,450 INFO [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: util.PlatformInfo (PlatformInfo.java:getJobFlowId(69)) - Unable to read clusterId from /var/lib/info/job-flow.json, out of places to look
2019-04-11T14:45:01,073 INFO [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: conf.HiveConf (HiveConf.java:getLogIdVar(3956)) - Using the default value passed in for log id: 6a95bad7-18e7-49de-856d-43219b7c5069
2019-04-11T14:45:01,073 INFO [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: session.SessionState (SessionState.java:resetThreadName(432)) - Resetting thread name to main
2019-04-11T14:45:01,072 ERROR [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: ql.Driver (SessionState.java:printError(1126)) - FAILED: $ComputationException java.lang.ArrayIndexOutOfBoundsException: 16227
com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$ComputationException: java.lang.ArrayIndexOutOfBoundsException: 16227
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$MapMaker$StrategyImpl.compute(MapMaker.java:553)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$MapMaker$StrategyImpl.compute(MapMaker.java:419)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$StackTraceElements.forMember(StackTraceElements.java:53)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.Errors.formatSource(Errors.java:690)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.Errors.format(Errors.java:555)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.ProvisionException.getMessage(ProvisionException.java:59)
at java.lang.Throwable.getLocalizedMessage(Throwable.java:391)
at java.lang.Throwable.toString(Throwable.java:480)
at java.lang.Throwable.<init>(Throwable.java:311)
at java.lang.Exception.<init>(Exception.java:102)
at org.apache.hadoop.hive.ql.metadata.HiveException.<init>(HiveException.java:41)
at org.apache.hadoop.hive.ql.parse.SemanticException.<init>(SemanticException.java:41)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.toReadEntity(BaseSemanticAnalyzer.java:1659)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.toReadEntity(BaseSemanticAnalyzer.java:1651)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.toReadEntity(BaseSemanticAnalyzer.java:1647)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:11968)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:11020)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11133)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 16227
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.asm.$ClassReader.readClass(Unknown Source)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.asm.$ClassReader.accept(Unknown Source)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.asm.$ClassReader.accept(Unknown Source)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$LineNumbers.<init>(LineNumbers.java:62)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$StackTraceElements$1.apply(StackTraceElements.java:36)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$StackTraceElements$1.apply(StackTraceElements.java:33)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$MapMaker$StrategyImpl.compute(MapMaker.java:549)
... 37 more

注意:当我使用 HDFS 位置创建时,同一张表。我能够成功地创建它。

最佳答案

调试进Hadoop和AWS的代码后,发现java.lang.ArrayIndexOutOfBoundsException与后面真正的错误无关。
事实上,EMR/Hadoop 已经产生了另一个错误(取决于你的情况),但是当它格式化这个错误信息时,它触发了另一个异常:java.lang.ArrayIndexOutOfBoundsException。有一个与此相关的问题:
https://github.com/google/guice/issues/757
为了找到背后的真正原因,你有一些选择:

  • 使用命令模拟您正在执行的操作并启用 Debug模式。例如,我在使用 EMRFS 从/向 S3 读取/写入数据时出错,因此我使用命令“hdfs dfs -ls s3://xxxxx/xxx”。在此命令之前,我使用变量启用了 Debug模式:export HADOOP_ROOT_LOGGER=DEBUG,console
    它可以显示一些有趣的错误
  • 如果第一个选项仍然没有显示任何内容,那么您可以像我一样做:
    2.1 导出HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005"
    2.2 启动命令“hdfs dfs -ls s3://xxxx/xxx”。它将等待远程客户端连接到 JVM 进行调试(我声明 suspend=y)
    2.3 使用IDE工具连接JVM。当然,在此之前,您需要将相关的 jar 文件导入或下载到您的 IDE 中。

  • 亚马逊确实需要通过升级版本来纠正 Google Guice 库错误。

    关于amazon-s3 - 无法在 aws EMR 集群中使用 hive 创建外部表,其中位置指向某个 S3 位置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55636244/

    28 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com