gpt4 book ai didi

amazon-web-services - Hive 查询抛出异常 - 编译语句 : FAILED: ArrayIndexOutOfBoundsException null 时出错

转载 作者:可可西里 更新时间:2023-11-01 16:42:29 24 4
gpt4 key购买 nike

我刚刚将 hive-exec 和 hive-jdbc 的 hive 版本升级到 2.1.0。

但正因为如此,一些以前工作正常的查询开始失败。

异常 -

Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)
at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:309)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250)
at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesInternal(HiveQueryExecutor.java:234)
at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesMetricsEnabled(HiveQueryExecutor.java:184)
at com.XXX.YYY.executors.HiveQueryExecutor.main(HiveQueryExecutor.java:500)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:186)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:269)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:460)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:447)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy33.executeStatementAsync(Unknown Source)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:294)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:497)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: null

我运行的查询 -

INSERT OVERWRITE TABLE base_performance_order_20160916
SELECT
*
FROM
(
select
coalesce(traffic_feed.sku,commerce_feed.sku) AS sku,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS transaction_date,
commerce_feed.units AS gross_units,
commerce_feed.orders AS gross_orders,
commerce_feed.revenue AS gross_revenue,
NULL AS gross_cost,
NULL AS gross_subsidized_cost,
NULL AS gross_shipping_cost,
NULL AS gross_variable_cost,
NULL AS gross_shipping_charges,
traffic_feed.pageViews AS page_views,
traffic_feed.uniqueVisitors AS unique_visits,
0 AS channel_id,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS feed_date,
from_unixtime(unix_timestamp()) AS creation_date
from traffic_feed
full outer join commerce_feed on coalesce(traffic_feed.sku)=commerce_feed.sku AND coalesce(traffic_feed.feed_date)=commerce_feed.feed_date
) tb
WHERE sku is not NULL and transaction_date is not NULL and channel_id is not NULL and feed_date is not NULL and creation_date is not NULL

当我在未设置任何配置单元变量的情况下运行此查询时,它工作正常。

但是当我在 Hive 配置属性下面设置时 -

"set hivevar:hive.mapjoin.smalltable.filesize=2000000000",
"set hivevar:mapreduce.map.speculative=false",
"set hivevar:mapreduce.output.fileoutputformat.compress=true",
"set hivevar:hive.exec.compress.output=true",
"set hivevar:mapreduce.task.timeout=6000000",
"set hivevar:hive.optimize.bucketmapjoin.sortedmerge=true",
"set hivevar:io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
"set hivevar:hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
"set hivevar:hive.auto.convert.sortmerge.join.noconditionaltask=true",
"set hivevar:FEED_DATE=20160916",
"set hivevar:hive.optimize.bucketmapjoin=true",
"set hivevar:hive.exec.compress.intermediate=true",
"set hivevar:hive.enforce.bucketmapjoin=true",
"set hivevar:mapred.output.compress=true",
"set hivevar:mapreduce.map.output.compress=true",
"set hivevar:hive.auto.convert.sortmerge.join=true",
"set hivevar:hive.auto.convert.join=false",
"set hivevar:mapreduce.reduce.speculative=false",
"set hivevar:PD_KEY=vijay-test-mail@XXXcommerce.pagerduty.com",
"set hivevar:mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec",
"set hive.mapjoin.smalltable.filesize=2000000000",
"set mapreduce.map.speculative=false",
"set mapreduce.output.fileoutputformat.compress=true",
"set hive.exec.compress.output=true",
"set mapreduce.task.timeout=6000000",
"set hive.optimize.bucketmapjoin.sortedmerge=true",
"set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
"set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
"set hive.auto.convert.sortmerge.join.noconditionaltask=true",
"set FEED_DATE=20160916",
"set hive.optimize.bucketmapjoin=true",
"set hive.exec.compress.intermediate=true",
"set hive.enforce.bucketmapjoin=true",
"set mapred.output.compress=true",
"set mapreduce.map.output.compress=true",
"set hive.auto.convert.sortmerge.join=true",
"set hive.auto.convert.join=false",
"set mapreduce.reduce.speculative=false",
"set PD_KEY=vijay-test-mail@XXXcommerce.pagerduty.com",
"set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"

它开始因上述异常而失败。

问题-

  1. 我设置的哪个 Hive 配置属性造成了问题(我只升级了 hive 和 hadoop 的版本)?

最佳答案

尝试禁用排序合并连接属性,这是一个临时解决方案。

由于您已将排序合并连接属性启用为 true,这将默认将 io.sort.mb 视为 2047 MB​​,这可能会导致 Arrayindexoutofbound 异常。因此,当您设置排序合并连接属性时,建议根据您在查询中使用的数据集大小,将 sort.io.mb 属性也设置为最佳值。

要了解查询需要多少数据,您可以解释查询: 解释它显示了在每个子查询和阶段中考虑了多少数据量。

希望这对您有所帮助。

关于amazon-web-services - Hive 查询抛出异常 - 编译语句 : FAILED: ArrayIndexOutOfBoundsException null 时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39533750/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com