3

I have a file like this :

1,<note><from>Messi</from><body>Don't forget me this weekend!</body></note>
2,<note><from>Ronaldo</from><body>Don't forget Laliga</body></note>
3,<note><from>Neymar</from><body>I am the best </body></note>
4,<note><from>Suarez</from><body>Don't forget me this weekend!</body></note>

where first field is id and second field is the data. I need to load this to an RDD, parse the xml string and extract fields, and create another RDD like this:

1,Messi,Don't forget me this weekend!
2,Ronaldo,Don't forget Laliga
3,Neymar,I am the best 
4,Suarez,Don't forget me this weekend!

Since the xml in actual scenario is complex, I would like to use an xml parser. How can I do this?

1 Answer 1

3

You can use Scala's own XML library. But, you will need to parse your string to Elem object before you can do that :

import scala.xml._

val str = "<note><from>Messi</from><body>Don't forget me this weekend!</body></note>"

val xml = XML.loadString(xml)
xml: scala.xml.Elem = <note><from>Messi</from><body>Don't forget me this weekend!</body></note>

To extract a single element, use:

xml \\ "note" \\ "from"
res19: scala.xml.NodeSeq = NodeSeq(<from>Messi</from>)

This results in an object of type NodeSeq, to get the string, use:

(xml \\ "note" \\ "from").text
res20: String = Messi

Coming to your question

val rdd = sc.parallelize(Array(
(1,"<note><from>Messi</from><body>Don't forget me this weekend!</body></note>"),
(2,"<note><from>Ronaldo</from><body>Don't forget La Liga</body></note>"),
(3,"<note><from>Neymar</from><body>I am the best </body></note>"),
(4,"<note><from>Suarez</from><body>Don't forget me this weekend!</body></note>")
)) 

rdd.map{ case (id, xml) => 
    (id , 
    (XML.loadString(xml) \\ "note" \\ "from").text , 
    (XML.loadString(xml) \\ "note" \\ "body").text ) 
}.collect.foreach(println)

(1,Messi,Don't forget me this weekend!)
(2,Ronaldo,Don't forget Laliga)
(3,Neymar,I am the best )
(4,Suarez,Don't forget me this weekend!)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.