arrayindexoutofbound exception while executing sql query on apache spark

Question

I have created a table in Spark by using the below commands in Spark

case class trip(trip_id: String, duration: String, start_date: String, 
        start_station: String, start_terminal: String, end_date: String, 
        end_station: String, end_terminal: String, bike: String, 
        subscriber_type: String, zipcode: String)

    val trip_data = sc.textFile("/user/sankha087_gmail_com/trip_data.csv")

    val tripDF = trip_data
        .map(x=> x.split(","))
        .filter(x=> (x(1)!= "Duration"))
        .map(x=> trip(x(0),x(1),x(2),x(3),x(4),x(5),x(6),x(7),x(8),x(9),x(10)))
        .toDF() 

    tripDF.registerTempTable("tripdatas")

    sqlContext.sql("select * from tripdatas").show()

If I am running the above query (i.e. select *) , then I am getting desired result , but say if I run the below query , then I am getting the below exception :

sqlContext.sql("select count(1) from tripdatas").show()

18/03/07 17:59:55 ERROR scheduler.TaskSetManager: Task 1 in stage 2.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 4 times, most recent failure: Lost task 1.3 in stage 2. 0 (TID 6, datanode1-cloudera.mettl.com, executor 1): java.lang.ArrayIndexOutOfBoundsException: 10
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$3.apply(:31)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$3.apply(:31)***

If error message is shown by sqlContext.sql("select count(1) from tripdatas").show() then the error message should appear with sqlContext.sql("select * from tripdatas").show() too — Anahcolus
– Anahcolus, Commented Mar 8, 2018 at 6:14

xan · Accepted Answer · 2018-03-07 19:57:13Z

1

Check your data. If any of the lines in your data has less than 11 elements, you'll see that error.

You can try this to see the minimum number of columns in this way.

val trip_data = spark.read.csv("/user/sankha087_gmail_com/trip_data.csv")
println(trip_data.columns.length)

answered Mar 7, 2018 at 19:57

xan

4183 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1670805 Over a year ago

I am trying your suggested command , but getting the below exception: ` scala> val trip_data_sample = spark.read.csv("/user/sankha087_gmail_com/trip_data.csv") <console>:25: error: not found: value spark val trip_data_sample = spark.read.csv("/user/sankha087_gmail_com/trip_data.csv") ` Please help

xan Over a year ago

'spark' is just spark session that is automatically created if you're using a spark shell. If you are writing a spark app, just substitute it with whatever sparkSession you're generating. And that was just a handy way of looking at the number of columns in your data. You can always do a manual inspection if the file isn't too large.

Collectives™ on Stack Overflow

arrayindexoutofbound exception while executing sql query on apache spark

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related