amazon-web-services - Hive 查询抛出异常 - 编译语句 : FAILED: ArrayIndexOutOfBoundsException null 时出错-6ren

amazon-web-services - Hive 查询抛出异常 - 编译语句 : FAILED: ArrayIndexOutOfBoundsException null 时出错

转载作者：可可西里更新时间：2023-11-01 16:42:29

我刚刚将 hive-exec 和 hive-jdbc 的 hive 版本升级到 2.1.0。

但正因为如此，一些以前工作正常的查询开始失败。

异常 -

Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)
    at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:309)
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250)
    at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesInternal(HiveQueryExecutor.java:234)
    at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesMetricsEnabled(HiveQueryExecutor.java:184)
    at com.XXX.YYY.executors.HiveQueryExecutor.main(HiveQueryExecutor.java:500)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
    at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
    at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:186)
    at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:269)
    at org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:460)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:447)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
    at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
    at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
    at com.sun.proxy.$Proxy33.executeStatementAsync(Unknown Source)
    at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:294)
    at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:497)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: null

我运行的查询 -

INSERT OVERWRITE TABLE base_performance_order_20160916
SELECT 
*
 FROM 
(
select
coalesce(traffic_feed.sku,commerce_feed.sku) AS sku,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS transaction_date,
commerce_feed.units AS gross_units,
commerce_feed.orders AS gross_orders,
commerce_feed.revenue AS gross_revenue,
NULL AS gross_cost,
NULL AS gross_subsidized_cost,
NULL AS gross_shipping_cost,
NULL AS gross_variable_cost,
NULL AS gross_shipping_charges,
traffic_feed.pageViews AS page_views,
traffic_feed.uniqueVisitors AS unique_visits,
0 AS channel_id,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS feed_date,
from_unixtime(unix_timestamp()) AS creation_date
from traffic_feed
full outer join commerce_feed on coalesce(traffic_feed.sku)=commerce_feed.sku AND coalesce(traffic_feed.feed_date)=commerce_feed.feed_date
) tb
WHERE sku is not NULL and transaction_date is not NULL and channel_id is not NULL and feed_date is not NULL and creation_date is not NULL

当我在未设置任何配置单元变量的情况下运行此查询时，它工作正常。

但是当我在 Hive 配置属性下面设置时 -

"set hivevar:hive.mapjoin.smalltable.filesize=2000000000",
                "set hivevar:mapreduce.map.speculative=false",
                "set hivevar:mapreduce.output.fileoutputformat.compress=true",
                "set hivevar:hive.exec.compress.output=true",
                "set hivevar:mapreduce.task.timeout=6000000",
                "set hivevar:hive.optimize.bucketmapjoin.sortedmerge=true",
                "set hivevar:io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
                "set hivevar:hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
                "set hivevar:hive.auto.convert.sortmerge.join.noconditionaltask=true",
                "set hivevar:FEED_DATE=20160916",
                "set hivevar:hive.optimize.bucketmapjoin=true",
                "set hivevar:hive.exec.compress.intermediate=true",
                "set hivevar:hive.enforce.bucketmapjoin=true",
                "set hivevar:mapred.output.compress=true",
                "set hivevar:mapreduce.map.output.compress=true",
                "set hivevar:hive.auto.convert.sortmerge.join=true",
                "set hivevar:hive.auto.convert.join=false",
                "set hivevar:mapreduce.reduce.speculative=false",
                "set hivevar:PD_KEY=vijay-test-mail@XXXcommerce.pagerduty.com",
                "set hivevar:mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec",
                "set hive.mapjoin.smalltable.filesize=2000000000",
                "set mapreduce.map.speculative=false",
                "set mapreduce.output.fileoutputformat.compress=true",
                "set hive.exec.compress.output=true",
                "set mapreduce.task.timeout=6000000",
                "set hive.optimize.bucketmapjoin.sortedmerge=true",
                "set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
                "set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
                "set hive.auto.convert.sortmerge.join.noconditionaltask=true",
                "set FEED_DATE=20160916",
                "set hive.optimize.bucketmapjoin=true",
                "set hive.exec.compress.intermediate=true",
                "set hive.enforce.bucketmapjoin=true",
                "set mapred.output.compress=true",
                "set mapreduce.map.output.compress=true",
                "set hive.auto.convert.sortmerge.join=true",
                "set hive.auto.convert.join=false",
                "set mapreduce.reduce.speculative=false",
                "set PD_KEY=vijay-test-mail@XXXcommerce.pagerduty.com",
                "set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"

它开始因上述异常而失败。

问题-

我设置的哪个 Hive 配置属性造成了问题(我只升级了 hive 和 hadoop 的版本)？

最佳答案

尝试禁用排序合并连接属性，这是一个临时解决方案。

由于您已将排序合并连接属性启用为 true，这将默认将 io.sort.mb 视为 2047 MB，这可能会导致 Arrayindexoutofbound 异常。因此，当您设置排序合并连接属性时，建议根据您在查询中使用的数据集大小，将 sort.io.mb 属性也设置为最佳值。

要了解查询需要多少数据，您可以解释查询: 解释它显示了在每个子查询和阶段中考虑了多少数据量。

希望这对您有所帮助。

关于amazon-web-services - Hive 查询抛出异常 - 编译语句 : FAILED: ArrayIndexOutOfBoundsException null 时出错，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39533750/

文章推荐： c++ - 按值返回时不调用复制构造函数

文章推荐： hadoop - Cloudera Manager 认证失败 : issue with ssh key

文章推荐： c++ - 声明 constexpr 函数或方法

fail-fast-fail-early - 表达式 "Fail Early"是什么意思，你想什么时候这样做？
“Fail Early”是什么意思，在什么情况下这种方法最有用，你什么时候会避免这种方法？最佳答案本质上，快速失败 (又名尽早失败 )是对您的软件进行编码，使得当出现问题时，软件会尽快并尽可能
c - 警告 : espcomm_sync failed error: espcomm_open failed error: espcomm_upload_mem failed
/* * 115200. Connect GPIO 0 of your ESP8266 to VCC and reset the board */ #include #include #inc
macos - 致命的 : Failed to start gitlab-runner: "launchctl" failed with stderr: Load failed: 5: Input/output error
安装并注册 gitlab-runner 后，当我运行时 gitlab-runner start我收到此错误消息。这是什么原因？ Runtime platform
windows-server - Windows容器无法启动，错误为 "failed to create endpoint on network nat: HNS failed with error : Failed to create endpoint."
我一直在尝试Windows Server 2016 TP5上的Windows容器。突然我在运行带有端口映射选项-p 80:80的容器时开始出错 c:\>docker run -it -p 80:80
hyperledger-fabric - 错误 : failed to create deliver client: orderer client failed to connect to orderer: failed to create new connection: context deadline exceeded
我一直在关注 Hyperledger Fabric Multi-Org setup 的教程，我能够成功地做到这一点。现在我想根据我想要的组织名称对其进行自定义，并且在尝试连接网络时遇到以下错误。希望有
lisp - 五上午 : fail to understand why this test fails
所以我不知道为什么这个测试失败了。当我运行 repl 中的语句时，一切似乎都正常工作，但 fiveam 测试失败。以下要点中有一个测试用例:https://gist.github.com/Puerc
安卓工作室 : execution failed for task : app:compileDebugAidl FAILED
我安装了 Android Studio 1.2.1.1、Gradle 版本 2.2.1 和 Android 插件版本 1.2.3。我试图创建一个简单的 hello world 项目，它给了我一个构建失
php - 交响乐 4 : WebTestCase fails (Failed to start the session)
我正在尝试设置一个简单的 WebTestCase，它使用 Symfony 4(和 "phpunit/phpunit": "^6.5")。但是，测试失败: Failed to start the ses
javascript - MarkCompactCollector : young object promotion failed Allocation failed
我已经使用 git clone 在本地克隆了一个包含 Vue 项目的 git 存储库. 然后我跑了npm install安装依赖项并获得 node_modules文件夹。正在运行 npm run s
Github Windows : Commit failed: Failed to create a new commit
我有:http://windows.github.com/ 我当前的项目有大约 20k 个文件，大约 150MB(并且不说它有多慢而且我现在什么也做不了)它甚至不允许我提交!我收到此错误:提交失败:无
安卓蓝牙 : "Scan failed, reason app registration failed for UUID"
我正在使用 RxAndroidBle 库开发一个应用程序，该库大约每 30 秒定期执行 BLE 扫描，每分钟左右执行一些 BLE 操作。几个小时后，通常在 5 到 24 小时之间，扫描停止工作。每次应
windows - Pycharm GitHub 'Push failed: fatal: Authentication failed'
每次我尝试使用 Pycharm 推送 GitHub 中的存储库时，它都会失败。 Push failed: fatal: Authentication failed for 'https://githu
java - resque :failed and resque:stat:failed keys?有什么区别
此外，管理内置“管理结构”(如标题中的结构)的 Resque 的最佳实践是什么？我应该用 jedis.del(String key) 或类似的东西清除它们吗？最佳答案 resque:failed 是
javascript - jQuery when/then/fail with concurrent ajax requests : Which request failed?
想象这样一种场景，我们想要在对“foo”和“bar”的并发请求成功完成后做一些事情，或者如果其中一个或两个失败则报告错误: $.when($.getJSON('foo'), $.getJSON('ba
python - cx_Oracle : ImportError: DLL load failed: This application has failed
这就是我所做的: 我使用的是 Windows XP SP3 我已经安装了 Python 2.7.1。我下载了instantclient-basic-nt-11.2.0.3.0.zip，解压后放入C:
php - vfsstream : file_get_contents() failed to open stream: stream_open call failed
我已经设置了一个 vfsstream block 设备，我正在尝试对其调用 file_get_contents()。然而，对 vfsStreamWrapper::stream_open 的调用失败，因
javascript - 类型错误 : Failed to execute 'createObjectURL' on 'URL' : Overload resolution failed
我正在尝试在我的 React 应用程序中使用文件上传功能，但遇到了问题。当我尝试上传第一张图片时，它工作得很好。文件资源管理器对话框关闭并显示我的图片。用我的文件资源管理器中的另一张图片覆盖图片也可以
mongodb - mongoexport 错误 : Failed: Failed to parse + Unrecognized field 'snapshot
目标:将我的本地 mongodb 数据迁移到 mongodb atlas 集群。尝试: 1.将本地数据导出为json。 2.导入json到集群。操作系统:Linuxmint 19.1 Cinnam
GCE : connection failed because connected host has failed to respond 上的 Python
我一直在从事一个需要在服务器(托管在 GCE 上)和多个客户端之间进行一些网络连接的项目。我创建了一个 Compute Engine 实例来运行 Python 脚本，如以下视频所示:https://w
postgresql - 错误 : failed to connect to database: password authentication failed in Rust
我正在尝试使用 sqlx crate 和 Postgres 数据库连接到 Rust 中的数据库。 main.rs: use dotenv; use sqlx::Pool; use sqlx::PgPo

可可西里

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

amazon-web-services - Hive 查询抛出异常 - 编译语句 : FAILED: ArrayIndexOutOfBoundsException null 时出错