1

Objective:

Given a string of the form val strang = "ab(cde(fgh)ijk)lmn)opq)rs"

First I want to construct a vector of the string split by the "(" and ")" characters. This is accomplished with,

val regexPattern = "((?<=\\()|(?=\\())|((?<=\\))|(?=\\)))"
val data: Vector[String] = strang .split(regexPattern ).map(_.toString).toVector

Note that the actual string will be much more varied, but how it's split stands.

Secondly and this is the hard part that I need help with. I want to iterate over the data vector and construct a nested vector with the following logic (below is psudo-codish)

val newVector =
for(row <- data) {
    if(row == "(") // opening angle bracket
        start constructing a nested vector inside of newVector
        keep appending to the nested vector until we encounter a ")"
        character (closing parenthesis).
        IF before encountering a ")" we encounter another "(" then
        create another nestedVector inside the previous nested vector
        and keep appending to it until a closing parenthesis ")" is
        encountered

    else if(row == ")")
        if we encounter a closing parenthesis, go up one level of the nested 
        vectors
    else
        simply append this row to the new vector
        newVector :+ row
}

I have made quite a few attempts at this with limited success. Using a recursive function I manage just one level of nesting but to go beyond that I end up using while loops which seems like overkill. The nested vectors can be of a special type. I have tried for example case class Rowy(row: String, rows: Vector[Vector[String]]) or case class Rowy(row: String, rows: Vector(Rowy))

I'm relatively new to scala and am wondering if there is an elegant way to approach this using scanLeft or foldLeft methods. Any help would be appreciated.

PS: I don't use an actual for loop to iterate over the "data" vector but rather a recursive function. The for loop is just to convey that an iteration is taking place. Any help would be appreciated.

3
  • 1
    Is this just an academic research question or a practical one? In the latter case, how are you going to use the parsed data structure? There is no trivial standard type that can contain such a data-structure except for not very useful Vector[Any], so you'll need a custom one and it might sense to tailor it to your usage scenario. Commented Jan 9, 2019 at 3:15
  • Terminology note: those are parentheses, not angle brackets (angle brackets are < and >). Commented Jan 9, 2019 at 6:10
  • Your regex pattern can be simplified: "((?<=[()])|(?=[()]))" Commented Jan 9, 2019 at 11:02

1 Answer 1

1

In your example, parentheses in the string are not balanced, and it is not really clear from your algorithm definition how it should be handled. If we assume that the input is actually well-balanced, then your native data structure would be a tree, with branches being one level of parentheses and leaves being the "words". Then the algorithm you need is just a simple stack-based tree construction. In the following example I use an explicit stack, but you can rewrite the algorithm to be recursive, and then the stack will be implicit.

// Tree structure definition
sealed trait Data
object Data {
  case class Leaf(data: String) extends Data
  case class Branch(parts: Vector[Data]) extends Data {
    // a convenience method to append data to a branch more easily
    def :+(data: Data): Branch = copy(parts = parts :+ data)
  }
}

object Main extends App {
  val regexPattern = "((?<=\\()|(?=\\())|((?<=\\))|(?=\\)))"
  val string = "ab(cde(fgh)ijk)lmn"  // now the input is balanced
  val data: Vector[String] = string.split(regexPattern).toVector

  println(data)

  var stack: List[Data.Branch] = Data.Branch(Vector.empty) :: Nil
  for (part <- data) {
    part match {
      case "(" =>
        stack = Data.Branch(Vector.empty) :: stack
      case ")" =>
        val top :: parent :: rest = stack
        stack = (parent :+ top) :: rest
      case _ =>
        stack = (stack.head :+ Data.Leaf(part)) :: stack.tail
    }
  }
  val result = stack.head

  println(result)
}

It outputs this:

Vector(ab, (, cde, (, fgh, ), ijk, ), lmn)
Branch(Vector(Leaf(ab), Branch(Vector(Leaf(cde), Branch(Vector(Leaf(fgh))), Leaf(ijk))), Leaf(lmn)))

If we reformat that a bit, we get

Branch(Vector(
  Leaf(ab),
  Branch(Vector(
    Leaf(cde),
    Branch(Vector(
      Leaf(fgh)
    )),
    Leaf(ijk)
  )),
  Leaf(lmn)
))

which, as you can see, mirrors the structure inside your string.

If your input could have unbalanced parentheses, then you might need to add additional checks to the loop in the case ")" part. Right now, if the input is not balanced, a MatchError would be thrown because the stack variable would not have the required number of elements.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Vladimir, seems like a tree is the way to go. The parentheses will not be balanced but that shouldn't be a problem as a simple end of daData check will solve the issue. About the angle brackets, the original code is split by angle brackets but stack wasn't displaying them so I switched to parentheses but must have not swapped the words out somewhere.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.