0

I have created a table in Spark by using the below commands in Spark

case class trip(trip_id: String, duration: String, start_date: String, 
        start_station: String, start_terminal: String, end_date: String, 
        end_station: String, end_terminal: String, bike: String, 
        subscriber_type: String, zipcode: String)

    val trip_data = sc.textFile("/user/sankha087_gmail_com/trip_data.csv")

    val tripDF = trip_data
        .map(x=> x.split(","))
        .filter(x=> (x(1)!= "Duration"))
        .map(x=> trip(x(0),x(1),x(2),x(3),x(4),x(5),x(6),x(7),x(8),x(9),x(10)))
        .toDF() 

    tripDF.registerTempTable("tripdatas")

    sqlContext.sql("select * from tripdatas").show()

If I am running the above query (i.e. select *) , then I am getting desired result , but say if I run the below query , then I am getting the below exception :

sqlContext.sql("select count(1) from tripdatas").show() 

18/03/07 17:59:55 ERROR scheduler.TaskSetManager: Task 1 in stage 2.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 4 times, most recent failure: Lost task 1.3 in stage 2. 0 (TID 6, datanode1-cloudera.mettl.com, executor 1): java.lang.ArrayIndexOutOfBoundsException: 10
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$3.apply(:31)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$3.apply(:31)***

1
  • If error message is shown by sqlContext.sql("select count(1) from tripdatas").show() then the error message should appear with sqlContext.sql("select * from tripdatas").show() too Commented Mar 8, 2018 at 6:14

1 Answer 1

1

Check your data. If any of the lines in your data has less than 11 elements, you'll see that error.

You can try this to see the minimum number of columns in this way.

val trip_data = spark.read.csv("/user/sankha087_gmail_com/trip_data.csv")
println(trip_data.columns.length)
Sign up to request clarification or add additional context in comments.

2 Comments

I am trying your suggested command , but getting the below exception: ` scala> val trip_data_sample = spark.read.csv("/user/sankha087_gmail_com/trip_data.csv") <console>:25: error: not found: value spark val trip_data_sample = spark.read.csv("/user/sankha087_gmail_com/trip_data.csv") ` Please help
'spark' is just spark session that is automatically created if you're using a spark shell. If you are writing a spark app, just substitute it with whatever sparkSession you're generating. And that was just a handy way of looking at the number of columns in your data. You can always do a manual inspection if the file isn't too large.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.