2

I am using the HTML parser to parse an HTML string:

import nu.validator.htmlparser.{sax,common}
import sax.HtmlParser
import common.XmlViolationPolicy

val source = Source.fromString(response)
val html = new models.HTML5Parser
val htmlObject = html.loadXML(source)

How do I pull values for specific elements in the object? I can get the child and the label using this:

val child = htmlObject.child(1).label

But I don't know how to get the content of the child. Also, I don't know how to iterate through the child objects.

0

1 Answer 1

3

It's unclear where your HTML5Parser class comes from, but I'm going to assume it's the one in this example (or something similar). In that case your htmlObject is just a scala.xml.Node. First for some setup:

val source = Source.fromString(
  "<html><head/><body><div class='main'><span>test</span></div></body></html>"
)

val htmlObject = html.loadXML(source)

Now you can do the following, for example:

scala> htmlObject.child(1).label
res0: String = body

scala> htmlObject.child(1).child(0).child(0).text
res1: String = test

scala> (htmlObject \\ "span").text
res2: String = test

scala> (htmlObject \ "body" \ "div" \ "span").text
res3: String = test

scala> (htmlObject \\ "div").head.attributes.asAttrMap
res4: Map[String,String] = Map(class -> main)

Etcetera.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.