0

I just upgraded hive version to 2.1.0 for both hive-exec and hive-jdbc.

But because of this, some queries started failing that previously working fine.

Exception -

Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)
    at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:309)
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250)
    at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesInternal(HiveQueryExecutor.java:234)
    at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesMetricsEnabled(HiveQueryExecutor.java:184)
    at com.XXX.YYY.executors.HiveQueryExecutor.main(HiveQueryExecutor.java:500)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
    at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
    at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:186)
    at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:269)
    at org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:460)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:447)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
    at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
    at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
    at com.sun.proxy.$Proxy33.executeStatementAsync(Unknown Source)
    at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:294)
    at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:497)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: null

Query that I ran -

INSERT OVERWRITE TABLE base_performance_order_20160916
SELECT 
*
 FROM 
(
select
coalesce(traffic_feed.sku,commerce_feed.sku) AS sku,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS transaction_date,
commerce_feed.units AS gross_units,
commerce_feed.orders AS gross_orders,
commerce_feed.revenue AS gross_revenue,
NULL AS gross_cost,
NULL AS gross_subsidized_cost,
NULL AS gross_shipping_cost,
NULL AS gross_variable_cost,
NULL AS gross_shipping_charges,
traffic_feed.pageViews AS page_views,
traffic_feed.uniqueVisitors AS unique_visits,
0 AS channel_id,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS feed_date,
from_unixtime(unix_timestamp()) AS creation_date
from traffic_feed
full outer join commerce_feed on coalesce(traffic_feed.sku)=commerce_feed.sku AND coalesce(traffic_feed.feed_date)=commerce_feed.feed_date
) tb
WHERE sku is not NULL and transaction_date is not NULL and channel_id is not NULL and feed_date is not NULL and creation_date is not NULL

It is working fine when I ran this query without setting any hive variables.

But when I set below Hive Configuration Properties -

"set hivevar:hive.mapjoin.smalltable.filesize=2000000000",
                "set hivevar:mapreduce.map.speculative=false",
                "set hivevar:mapreduce.output.fileoutputformat.compress=true",
                "set hivevar:hive.exec.compress.output=true",
                "set hivevar:mapreduce.task.timeout=6000000",
                "set hivevar:hive.optimize.bucketmapjoin.sortedmerge=true",
                "set hivevar:io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
                "set hivevar:hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
                "set hivevar:hive.auto.convert.sortmerge.join.noconditionaltask=true",
                "set hivevar:FEED_DATE=20160916",
                "set hivevar:hive.optimize.bucketmapjoin=true",
                "set hivevar:hive.exec.compress.intermediate=true",
                "set hivevar:hive.enforce.bucketmapjoin=true",
                "set hivevar:mapred.output.compress=true",
                "set hivevar:mapreduce.map.output.compress=true",
                "set hivevar:hive.auto.convert.sortmerge.join=true",
                "set hivevar:hive.auto.convert.join=false",
                "set hivevar:mapreduce.reduce.speculative=false",
                "set hivevar:[email protected]",
                "set hivevar:mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec",
                "set hive.mapjoin.smalltable.filesize=2000000000",
                "set mapreduce.map.speculative=false",
                "set mapreduce.output.fileoutputformat.compress=true",
                "set hive.exec.compress.output=true",
                "set mapreduce.task.timeout=6000000",
                "set hive.optimize.bucketmapjoin.sortedmerge=true",
                "set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
                "set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
                "set hive.auto.convert.sortmerge.join.noconditionaltask=true",
                "set FEED_DATE=20160916",
                "set hive.optimize.bucketmapjoin=true",
                "set hive.exec.compress.intermediate=true",
                "set hive.enforce.bucketmapjoin=true",
                "set mapred.output.compress=true",
                "set mapreduce.map.output.compress=true",
                "set hive.auto.convert.sortmerge.join=true",
                "set hive.auto.convert.join=false",
                "set mapreduce.reduce.speculative=false",
                "set [email protected]",
                "set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"

It started failing with above mentioned exceptions.

Questions-

  1. Which Hive Configuration Properties that I set is creating problem (I upgraded version of hive and hadoop only) ?
4
  • Can you try disabling the sort merge join property Commented Sep 16, 2016 at 18:21
  • @KSNidhin This is what I also tried, and it worked. Commented Sep 17, 2016 at 10:04
  • @KSNidhin Is there any consequences of this ? What is the use of this properties ? Commented Sep 17, 2016 at 10:05
  • @KSNidhin Also, add this in answer, I will accept your answer. Commented Sep 17, 2016 at 11:21

1 Answer 1

1

Try disabling the sort merge join property which is an interim solution.

Since you have enabled the sort merge join property as true, this will by default consider the io.sort.mb as 2047 MB and this might lead to the Arrayindexoutofbound exception. So when you set the sort merge join property it is advised to set the sort.io.mb property also with the optimum value based on your dataset size used in the query.

To know how much data size the query takes , you could explain the query : explain which shows how much data volume is considered in each sub query and stages.

Hope this helps.

Sign up to request clarification or add additional context in comments.

2 Comments

I faced another issue. Can you please help me in that ? stackoverflow.com/questions/39547001/…
If possible, can we have chat ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.