2

I have a XML file that I'm trying to process through Spark-Shell using Scala. I am stuck at a point where I need to read the Array[String] using Scala's

scala> val fileRead = sc.textFile("source_file")
fileRead: org.apache.spark.rdd.RDD[String] = source_file MapPartitionsRDD[8] at textFile at <console>:21

scala> val strLines = fileRead.map(x => x.toString)
strLines: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[9] at map at <console>:23

scala> val fltrLines = strLines.filter(_.contains("<record column1="))
fltrLines: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[10] at filter at <console>:25

scala> fltrLines.take(5)
res1: Array[String] = Array("<record column1="1" column2="1" column3="5" column4="2010-11-02T18:59:01.140" />", "<record column2=....

I need to read this value of the Array[String]:

"<record column1="1" column2="1" column3="5" column4="2010-11-02T18:59:01.140" />"

as XML so that I can use Scala Elem and NodeSeq classes to extract the data. So I want to do something like:

val xmlLines = fltrLines.....somehow get the value of the value of Array[String] first index

And then use xmlLines.attributes, etc.

1 Answer 1

2

You can do fltrLines.map { scala.xml.XML.loadString _ }, which should build Elems out of Strings. Check the docs, notice though that this is an old Scaladoc, when Scala std. lib. still contained XML, these days it resides in a separate jar file. So, if you are using a newer version, make sure to put the right jar in your classpath.

Sign up to request clarification or add additional context in comments.

2 Comments

Perfect, I ended up doing this scala> val element = fltrLines.map{ scala.xml.XML.loadString _ }. Now I am trying to concatenate the attributes with a comma separator. scala> val elementAttributes = element.map(_.attributes(column1)) and concatenate a comma and then column2, etc Trying to sift through the documentation you provided. Thank you!
I have accepted the answer, but apparently I don't have 15 reputations to upvote your answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.