I have a RDD entitled name.
scala> name
res6: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[24] at map at <console>:37
I can inspect it using name.foreach(println)
name5000005125651330
name5000005125651331
name5000005125651332
name5000005125651333
I wish to create a new RDD that removes the name characters from the beginning of each record and returns the remaining numbers in long format.
Desired outcome:
5000005125651330
5000005125651331
5000005125651332
5000005125651333
I have tried the following:
val name_clean = name.filter(_ != "name")
However this returns:
name5000005125651330
name5000005125651331
name5000005125651332
name5000005125651333
name.map(_.drop(4).toLong)should do it (that just drops the first four characters unconditionally, it doesn't check that they're n a m e.