1

Remark: please consider XPath syntax dead here, thank you.

I have xml node (HTML actually), and I would like to get an attribute of it.

In C# (HTMLAgilityPack) I could get attribute object by name. For example having "a" node I could ask for "href" attribute.

In Scala there is "attribute" method within xml.Node, but this returns a sequence of.. nodes. An attribute is a node? How it is possible to have several attributes with the same name? I am completely puzzled.

Moreover there is xml.Attribute class but I don't see it used in xml.Node.

I have PiS book but XML chapter is very shallow.

The question

How should I understand asking for an attribute an getting collection of nodes?

IOW: what sense is in returning an option of collection of nodes instead of returning attribute?

  • option -- if there is no attribute, collection should be empty, it is doubling semantics
  • collection -- this implies there are multiple attribute possible, so I am curious in what scenario I get collection of size > 1
  • node -- attribute is pretty simply entity, why such overkill and suggesting that attribute can have tree structure

2 Answers 2

4

You just want to get the value of an attribute, yes? In which case that's pretty easy:

scala> val x = <foo this="xx" that="yy" />
x: scala.xml.Elem = <foo this="xx" that="yy"></foo>

scala> x.attribute("this")
res0: Option[Seq[scala.xml.Node]] = Some(xx)

scala> x.attribute("this").get.toString
res1: String = xx

I know that you said that you explicitly aren't interested in XPath syntax, but in this instance it really is rather neater:

scala> x \ "@this"
res2: scala.xml.NodeSeq = xx

Having said all of this, you should be aware that there are many problems with attribute handling in Scala's built-in XML handling. See, for example, this, this and this.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, but I am not asking question how-to-do-it kind, but how to understand it. For me is like looking at int + int => String. See update.
Ah - sorry. So the first thing to realise is that the design of the built-in XML processing is bad. It's difficult to understand and difficult to use. Don't be surprised if things don't make sense - it's not that you're misunderstanding, it really is a bad design. Read the links in my answer for examples of how it's badly designed. But to try to be a bit more helpful - the library chooses to represent everything as a Node, even attributes, even though an attribute can't have children.
The links you provided are superb (meanwhile I checked AntiXML thanks to them, this is bad as well, no descendants method in node).It is good to know I didn't go crazy ;-D and it is actually this particular library. Thank you again.
0

I realise that Paul's follow up answer pretty much covers your question but I'd just like to add a few more points:

  1. I personally don't like the design of Scala XML, to the extent that I wrote an alternative library Scales Xml, but I wouldn't call it badly designed. Design elements of it are apparently also good enough to form the basis of Anti-Xml's approach (Elements owning their children, a concept of grouping nodes etc), but there are many quirks - attribute and text as containers being a large one.
  2. I've only recently committed descendant axis to Scales - its greedy nature works differently than descendant-or-self - as per the spec //para1 does not mean the same as the location path /descendant::para1
  3. I'm not sure you can attribute bad design to Anti-Xml either for its absence, its a young project (just over seven months old?) and they may simply not have gotten round to adding descendant yet.

Direct answer for the attribute question for Scales is:

val pre = Namespace("uri:test").prefixed("pre")

val elem = Elem("fred"l, emptyAttributes + 
        ("attr", "value") +
        Attribute(pre("attr"), "value"))

println("attributes are a map " + elem.attributes("attr"))

println("attributes are a set " + (
  elem.attributes + ("attr", "new value")))

val xpath = top(elem) \@ pre("attr")

xpath foreach{ap => println(ap.name)}

giving

[info] attributes are a map Some(Attribute({}attr,value))
[info] attributes are a set ListSet(Attribute({}attr,new value), Attribute({uri:test}attr,value))
[info] {uri:test}attr

The XPath syntax must return a collection as it could be any number of paths that reached a matching attribute. Element Attributes themselves are QName matched "attr" meaning no namespace and localName of attr. For additional sanity an attribute QName is:

type AttributeQName = EitherLike[PrefixedQName, NoNamespaceQName]

The compiler makes sure no local name only QNames creep in.

As an aside, whilst I understand why the Scala XML XPath like syntax is probably uninteresting, you should have a look at Scales for XPath based querying.

There is both XPath 1.0 string based querying (not yet pushed into a non snapshot version) and an internal dsl that lets the compiler / ide help you out (plus the bonus of being far quicker and working with scala code directly).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.