I have created a table in Spark by using the below commands in Spark
case class trip(trip_id: String, duration: String, start_date: String,
start_station: String, start_terminal: String, end_date: String,
end_station: String, end_terminal: String, bike: String,
subscriber_type: String, zipcode: String)
val trip_data = sc.textFile("/user/sankha087_gmail_com/trip_data.csv")
val tripDF = trip_data
.map(x=> x.split(","))
.filter(x=> (x(1)!= "Duration"))
.map(x=> trip(x(0),x(1),x(2),x(3),x(4),x(5),x(6),x(7),x(8),x(9),x(10)))
.toDF()
tripDF.registerTempTable("tripdatas")
sqlContext.sql("select * from tripdatas").show()
If I am running the above query (i.e. select *) , then I am getting desired result , but say if I run the below query , then I am getting the below exception :
sqlContext.sql("select count(1) from tripdatas").show()
18/03/07 17:59:55 ERROR scheduler.TaskSetManager: Task 1 in stage 2.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 4 times, most recent failure: Lost task 1.3 in stage 2. 0 (TID 6, datanode1-cloudera.mettl.com, executor 1): java.lang.ArrayIndexOutOfBoundsException: 10
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$3.apply(:31)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$3.apply(:31)***
sqlContext.sql("select count(1) from tripdatas").show()then the error message should appear withsqlContext.sql("select * from tripdatas").show()too