1

I have a RDD[String] which contains following data:

data format : ('Movie Name','Actress Name')

('Night of the Demons (2009)  (uncredited)', '"Steff", Stefanie Oxmann Mcgaha')
('The Bad Lieutenant: Port of Call - New Orleans (2009)  (uncredited)', '"Steff", Stefanie Oxmann Mcgaha') 
('"Please Like Me" (2013) {All You Can Eat (#1.4)}', '$haniqua') 
('"Please Like Me" (2013) {French Toast (#1.2)}', '$haniqua') 
('"Please Like Me" (2013) {Horrible Sandwiches (#1.6)}', '$haniqua')

I want to convert this to RDD[String,String] such as first element within ' ' will be my first String in RDD and second element within ' ' will be my second String in RDD.

I tried this:

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
val splitRdd = rdd1.map( line => line.split(",") )
splitRdd.foreach(println)

but it's giving me an error as :

[Ljava.lang.String;@7741fb9
[Ljava.lang.String;@225f63a5
[Ljava.lang.String;@63640bc4
[Ljava.lang.String;@1354c1de
3
  • 1
    That isn't an error message, that's the object-ids for a bunch of strings. Commented Oct 8, 2016 at 1:41
  • @Malvolio Can you please tell me how can I remove that error Commented Oct 8, 2016 at 1:50
  • Call toList on the result from split. Commented Oct 8, 2016 at 18:22

4 Answers 4

5

[Ljava.lang.String;@7741fb9 is not an error, This is wt is printed when you try to print an array.

[ - an single-dimensional array

L - the array contains a class or interface

java.lang.String - the type of objects in the array

@ - joins the string together

7741fb9 the hashcode of the object.

To print String array you can try this code:

import scala.runtime.ScalaRunTime._
splitRdd.foreach(array => println(stringOf(array)))

Source

Sign up to request clarification or add additional context in comments.

Comments

0

It's not an error. we could also use flatMap() here to avoid confusion,

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
rdd1.flatMap( line => line.split(",")).foreach(println)

Here, The input function to map returns a single element (array), while the flatMap returns a list of elements (0 or more). Also, the output of the flatMap is flattened.

Comments

0

Since it is csv file with field-enclosed & row-enclosed, you need to read the file using regular expressions. Simple split doesn't work.

Comments

0

Try this to convert RDD[String] to RDD[String,String]

val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
val splitRdd = rdd1.map( line => (line.split(",")(0), line.split(",")(1)) )

The above line returns the rdd as key, value pair [Tuple] RDD.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.