2

A text file should be parsed line by line, using Scala pattern matching and regular expressions. If a line starts with "names:\t" the subsequent tab-separated names should be provided as a Seq[String] (or something similar).

Here a non-working code example:

val Names = "^names:(?:\t([a-zA-Z0-9_]+))+$".r

"names:\taaa\tbbb\tccc" match {
  case Names(names @ _*) => println(names)
  // […] other cases
  case _ => println("no match")
}

Output: List(ccc)
Wanted output: List(aaa, bbb, ccc)

The following code works as desired…

object NamesObject {
  private val NamesLine = "^names:\t([a-zA-Z0-9_]+(?:\t[a-zA-Z0-9_]+)*)$".r

  def unapplySeq(s: String): Option[Seq[String]] = s match {
    case NamesLine(nameString) => Some(nameString.split("\t"))
    case _ => None
  }
}

"names:\taaa\tbbb\tccc" match {
  case NamesObject(names @ _*) => println(names)
  // […] other cases
  case _ => println("no match")
}

Output (as wanted): WrappedArray(aaa, bbb, ccc)

I would like to know: Is this is possible in a simpler manner without creating an object, just like in the first but non-working code example?

1
  • The Scala RegEx extractor is a match, not a find, so the ^ and $ are redundant. If you want find semantics, you need .* at the beginning and / or end. (Not relevant to the problem you're trying to solve, though.) Commented Mar 26, 2013 at 15:23

2 Answers 2

1

Use your working regex.(\w is a predefined character class for[a-zA-Z0-9_])

  val Names = """names:\t(\w+(?:\t\w+)*)""".r
  "names:\taaa\tbbb\tccc" match {
    case Names(names) => println(names.split("\t") toSeq)
    case _ => println("no match")
  }

With first, second & tail bindings,

  val Names = """names:\t(\w+)?\t?(\w+)?\t?((?:\w+?\t?)*)""".r
  "names:\taaa\tbbb\tccc\tddd" match {
    case Names(first, second, tail) =>
      println(first + ", " + second + ", " + (tail.split("\t") toSeq));
    case _ => println("no match")
  }
Sign up to request clarification or add additional context in comments.

4 Comments

You beat me to it. I believe this is the only solution, namely to parse it in two parts (or use the low-level RegEx API, which is less nice than what the OP attempted and this solution).
This solution is short, but something like Names(first, second, tail @ _*) is not possible, but it would be great for flexibility.
@Radon, answer is updated with first, second and tail binding
@PrinceJohnWesley: Thank you for your efforts, but I want it a bit more flexible, e.g. Names(a), Names(a,b), Names(a,b,c,d) and Names(all @ _*) should be possible, too.
0

As Randall Schulz said, it seems not to be possible just using regular expressions. Therefore the short answer to my question would be no.

My current solution is the following: I use the this class…

class SeparatedLinePattern(Pattern: Regex, separator: String = "\t") {
  def unapplySeq(s: String): Option[Seq[String]] = s match {
    case Pattern(nameString) => Some(nameString.split(separator).toSeq)
    case _ => None
  }
}

…to create the patterns:

val Names = new SeparatedLinePattern("""names:\t([A-Za-z]+(?:\t[A-Za-z]+)*)""".r)
val Ints = new SeparatedLinePattern("""ints:\t(\d+(?:\t\d+)*)""".r)
val ValuesWithID = new SeparatedLinePattern("""id-value:\t(\d+\t\w+(?:\t\d+\t\w+)*)""".r)

Here an example how they can be used in a quite flexible manner:

val testStrings = List("names:\taaa", "names:\tbbb\tccc", "names:\tddd\teee\tfff\tggg\thhh",
                       "ints:\t123", "ints:\t456\t789", "ints:\t100\t200\t300",
                       "id-value:\t42\tbaz", "id-value:\t2\tfoo\t5\tbar\t23\tbla")

for (s <- testStrings) s match {
  case Names(name) => println(s"The name is '$name'")
  case Names(a, b) => println(s"The two names are '$a' and '$b'")
  case Names(names @ _*) => println("Many names: " + names.mkString(", "))

  case Ints(a) => println(s"Just $a")
  case Ints(a, b) => println(s"$a + $b == ${a.toInt + b.toInt}")
  case Ints(nums @ _*) => println("Sum is " + (nums map (_.toInt)).sum)

  case ValuesWithID(id, value) => println(s"ID of '$value' is $id")
  case ValuesWithID(values @ _*) => println("As map: " + (values.grouped(2) map (x => x(0).toInt -> x(1))).toMap)

  case _ => println("No match")
}

Output:

The name is 'aaa'
The two names are 'bbb' and 'ccc'
Many names: ddd, eee, fff, ggg, hhh
Just 123
456 + 789 == 1245
Sum is 600
ID of 'baz' is 42
As map: Map(2 -> foo, 5 -> bar, 23 -> bla)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.