0

I am new to spark and machine learning.I am working on a data set with details as below:

scala> val inp=sc.textFile("Telephone.txt")
inp: org.apache.spark.rdd.RDD[String] = Telephone.txt MapPartitionsRDD[1] at textFile at <console>:35

scala> inp.first()
res0: String = 2014-03-15:10:10:20,Sorrento,8cc3b47e-bd01-4482-b500-28f2342679af,33.6894754264,-117.543308253

scala> case class Telephone(dt:String,ct:String,s:String,lat:Double,lon:Double)

defined class Telephone

scala> val inp_split=inp.map(x=>x.split(","))
inp_split: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[2] at map at <console>:37

scala> val telrdd=inp_split.map(x=>Telephone(x(0),x(1),x(2),x(3).toDouble,x(4).toDouble))
telrdd: org.apache.spark.rdd.RDD[Telephone] = MapPartitionsRDD[3] at map at <console>:41

scala> val telDF=telrdd.toDF()
telDF: org.apache.spark.sql.DataFrame = [dt: string, ct: string, s: string, lat: double, lon: double]

But when I perform count operation on telDF,I get the below error:

scala> teldf.count()
[Stage 31:=============================>                            (1 + 1) / 2]18/01/22 20:16:19 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 31.0 (TID 5
3, cloudera-slavenode2.cloudlab.com, executor 16): java.lang.ArrayIndexOutOfBoundsException: 1

Can someone please help me with this error?

1
  • How big is Telephone.txt? Could you post it? Commented Jan 23, 2018 at 8:42

1 Answer 1

1

I think you should check your Telephone.txt. Most probably the issue comes from the fact that at some line there is a wrong data (such as just an empty line) so there is no x(1) in the following code

val inp_split=inp.map(x=>x.split(","))
val telrdd=inp_split.map(x=>Telephone(x(0),x(1),x(2),x(3).toDouble,x(4).toDouble))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.