Hive query throwing exception - Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null

Question

I just upgraded hive version to 2.1.0 for both hive-exec and hive-jdbc.

But because of this, some queries started failing that previously working fine.

Exception -

Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)
    at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:309)
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250)
    at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesInternal(HiveQueryExecutor.java:234)
    at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesMetricsEnabled(HiveQueryExecutor.java:184)
    at com.XXX.YYY.executors.HiveQueryExecutor.main(HiveQueryExecutor.java:500)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
    at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
    at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:186)
    at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:269)
    at org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:460)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:447)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
    at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
    at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
    at com.sun.proxy.$Proxy33.executeStatementAsync(Unknown Source)
    at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:294)
    at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:497)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: null

Query that I ran -

INSERT OVERWRITE TABLE base_performance_order_20160916
SELECT 
*
 FROM 
(
select
coalesce(traffic_feed.sku,commerce_feed.sku) AS sku,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS transaction_date,
commerce_feed.units AS gross_units,
commerce_feed.orders AS gross_orders,
commerce_feed.revenue AS gross_revenue,
NULL AS gross_cost,
NULL AS gross_subsidized_cost,
NULL AS gross_shipping_cost,
NULL AS gross_variable_cost,
NULL AS gross_shipping_charges,
traffic_feed.pageViews AS page_views,
traffic_feed.uniqueVisitors AS unique_visits,
0 AS channel_id,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS feed_date,
from_unixtime(unix_timestamp()) AS creation_date
from traffic_feed
full outer join commerce_feed on coalesce(traffic_feed.sku)=commerce_feed.sku AND coalesce(traffic_feed.feed_date)=commerce_feed.feed_date
) tb
WHERE sku is not NULL and transaction_date is not NULL and channel_id is not NULL and feed_date is not NULL and creation_date is not NULL

It is working fine when I ran this query without setting any hive variables.

But when I set below Hive Configuration Properties -

"set hivevar:hive.mapjoin.smalltable.filesize=2000000000",
                "set hivevar:mapreduce.map.speculative=false",
                "set hivevar:mapreduce.output.fileoutputformat.compress=true",
                "set hivevar:hive.exec.compress.output=true",
                "set hivevar:mapreduce.task.timeout=6000000",
                "set hivevar:hive.optimize.bucketmapjoin.sortedmerge=true",
                "set hivevar:io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
                "set hivevar:hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
                "set hivevar:hive.auto.convert.sortmerge.join.noconditionaltask=true",
                "set hivevar:FEED_DATE=20160916",
                "set hivevar:hive.optimize.bucketmapjoin=true",
                "set hivevar:hive.exec.compress.intermediate=true",
                "set hivevar:hive.enforce.bucketmapjoin=true",
                "set hivevar:mapred.output.compress=true",
                "set hivevar:mapreduce.map.output.compress=true",
                "set hivevar:hive.auto.convert.sortmerge.join=true",
                "set hivevar:hive.auto.convert.join=false",
                "set hivevar:mapreduce.reduce.speculative=false",
                "set hivevar:[email protected]",
                "set hivevar:mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec",
                "set hive.mapjoin.smalltable.filesize=2000000000",
                "set mapreduce.map.speculative=false",
                "set mapreduce.output.fileoutputformat.compress=true",
                "set hive.exec.compress.output=true",
                "set mapreduce.task.timeout=6000000",
                "set hive.optimize.bucketmapjoin.sortedmerge=true",
                "set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
                "set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
                "set hive.auto.convert.sortmerge.join.noconditionaltask=true",
                "set FEED_DATE=20160916",
                "set hive.optimize.bucketmapjoin=true",
                "set hive.exec.compress.intermediate=true",
                "set hive.enforce.bucketmapjoin=true",
                "set mapred.output.compress=true",
                "set mapreduce.map.output.compress=true",
                "set hive.auto.convert.sortmerge.join=true",
                "set hive.auto.convert.join=false",
                "set mapreduce.reduce.speculative=false",
                "set [email protected]",
                "set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"

It started failing with above mentioned exceptions.

Questions-

Which Hive Configuration Properties that I set is creating problem (I upgraded version of hive and hadoop only) ?

@KSNidhin Is there any consequences of this ? What is the use of this properties ? — devsda
– devsda, Commented Sep 17, 2016 at 10:05
@KSNidhin Also, add this in answer, I will accept your answer. — devsda
– devsda, Commented Sep 17, 2016 at 11:21

K S Nidhin · Accepted Answer · 2016-09-19 02:03:47Z

1

Try disabling the sort merge join property which is an interim solution.

Since you have enabled the sort merge join property as true, this will by default consider the io.sort.mb as 2047 MB and this might lead to the Arrayindexoutofbound exception. So when you set the sort merge join property it is advised to set the sort.io.mb property also with the optimum value based on your dataset size used in the query.

To know how much data size the query takes , you could explain the query : explain which shows how much data volume is considered in each sub query and stages.

Hope this helps.

answered Sep 19, 2016 at 2:03

K S Nidhin

2,6583 gold badges28 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

devsda Over a year ago

I faced another issue. Can you please help me in that ? stackoverflow.com/questions/39547001/…

devsda Over a year ago

If possible, can we have chat ?

Collectives™ on Stack Overflow

Hive query throwing exception - Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related