0

我刚刚将hive-version和hive-jdbc的配置单元升级到2.1.0。Hive查询抛出异常 - 编译语句时出错:FAILED:ArrayIndexOutOfBoundsException null

但是由于这个原因,一些查询开始失败,以前工作正常。

例外 -

Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null 
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264) 
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250) 
    at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:309) 
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250) 
    at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesInternal(HiveQueryExecutor.java:234) 
    at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesMetricsEnabled(HiveQueryExecutor.java:184) 
    at com.XXX.YYY.executors.HiveQueryExecutor.main(HiveQueryExecutor.java:500) 
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null 
    at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387) 
    at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:186) 
    at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:269) 
    at org.apache.hive.service.cli.operation.Operation.run(Operation.java:324) 
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:460) 
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:447) 
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) 
    at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) 
    at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:422) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) 
    at com.sun.proxy.$Proxy33.executeStatementAsync(Unknown Source) 
    at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:294) 
    at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:497) 
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437) 
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422) 
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
    at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) 
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.ArrayIndexOutOfBoundsException: null 

查询我跑 -

INSERT OVERWRITE TABLE base_performance_order_20160916 
SELECT 
* 
FROM 
(
select 
coalesce(traffic_feed.sku,commerce_feed.sku) AS sku, 
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS transaction_date, 
commerce_feed.units AS gross_units, 
commerce_feed.orders AS gross_orders, 
commerce_feed.revenue AS gross_revenue, 
NULL AS gross_cost, 
NULL AS gross_subsidized_cost, 
NULL AS gross_shipping_cost, 
NULL AS gross_variable_cost, 
NULL AS gross_shipping_charges, 
traffic_feed.pageViews AS page_views, 
traffic_feed.uniqueVisitors AS unique_visits, 
0 AS channel_id, 
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS feed_date, 
from_unixtime(unix_timestamp()) AS creation_date 
from traffic_feed 
full outer join commerce_feed on coalesce(traffic_feed.sku)=commerce_feed.sku AND coalesce(traffic_feed.feed_date)=commerce_feed.feed_date 
) tb 
WHERE sku is not NULL and transaction_date is not NULL and channel_id is not NULL and feed_date is not NULL and creation_date is not NULL 

它工作正常,当我跑这查询不设置任何蜂巢变量。

但是,当我设置为低于蜂巢配置属性 -

它开始与上述异常失败。

问题 -

  1. 这是我设定的创建问题蜂巢配置属性(我升级蜂巢和Hadoop的版本)?
+0

你可以尝试禁用排序合并连接属性 –

+0

@KSNidhin这是我也试过,它的工作。 – devsda

+0

@KSNidhin这有什么后果吗?这个属性有什么用途? – devsda

回答

1

尝试禁用作为临时解决方案的排序合并连接属性。

由于您已启用排序合并连接属性为true,因此默认情况下会将io.sort.mb视为2047 MB​​,这可能会导致Arrayindexoutofbound异常。因此,当您设置排序合并连接属性时,建议使用基于查询中使用的数据集大小的最佳值设置sort.io.mb属性。

要知道查询需要多少数据量,可以解释查询: 解释 其中显示了在每个子查询和阶段中考虑了多少数据量。

希望这会有所帮助。

+0

我面临另一个问题。你能帮我吗? http://stackoverflow.com/questions/39547001/why-hive-staging-file-is-missing-in-aws-emr – devsda

+0

如果可能,我们可以聊天吗? – devsda