1

Input

XML input in data

val data = <Doc><Title>Doc</Title><Type><Type level="0">A</Type><Type level="1">B</Type></Type><Type><Type level="0">C</Type><Type level="1">D</Type><Type level="2">E</Type></Type></Doc>

Desired output

Title : Doc
Type_1 : A | B
Type_2 : C | D | E

What I have tried

For title - (data // "Title").text

The issue is with "Type" tags in the XML

Need to group each type tag together

Below is the screenshot for all tried commands to extract Type and group them as above desired result. Extracting type tags

Need some guidance/logic on how we can group Type tags as per the desired result.

1 Answer 1

1

Initial data:

scala> val data = <Doc><Title>Doc</Title><Type><Type level="0">A</Type><Type level="1">B</Type></Type><Type><Type level="0">C</Type><Type level="1">D</Type><Type level="2">E</Type></Type></Doc>
data: scala.xml.Elem = <Doc><Title>Doc</Title><Type><Type level="0">A</Type><Type level="1">B</Type></Type><Type><Type level="0">C</Type><Type level="1">D</Type><Type level="2">E</Type></Type></Doc>

Which in XML looks like this:

<Doc>
    <Title>Doc</Title>
    <Type>
        <Type level="0">A</Type>
        <Type level="1">B</Type>
    </Type>
    <Type>
        <Type level="0">C</Type>
        <Type level="1">D</Type>
        <Type level="2">E</Type>
    </Type>
</Doc>

All nodes of tag Type with projected level attribute and corresponding value:

scala> val types = (data \ "Type" \ "Type") map (x => (x \ "@level").text -> x.text)
types: scala.collection.immutable.Seq[(String, String)] =
List((0,A), (1,B), (0,C), (1,D), (2,E))

Groupped by level:

types.groupBy(_._1).map { case (level, elems) => level -> elems.map(_._2) }
res3: scala.collection.immutable.Map[String,scala.collection.immutable.Seq[String]] =
Map(2 -> List(E), 1 -> List(B, D), 0 -> List(A, C))

If you want the grouping as requested:

Type_1 : A | B
Type_2 : C | D | E

then:

scala> (data \ "Type").zipWithIndex.map {case (s, idx) => idx -> (s \ "Type").map(_.text) }
res4: scala.collection.immutable.Seq[(Int, scala.collection.immutable.Seq[String])] =
List((0,List(A, B)), (1,List(C, D, E)))

But it feels wrong to me because in XML the order of the elements/nodes usually should not matter.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much !! I will explore more on the same ... Exactly. but you know what we cannot change the XML as this source file is getting used by other application teams ...
I split it into 2 to make it easier to read. You can substitute 'types' variable with the expression above.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.