5

In a following code the same pattern matches when Java API is used, but not when using Scala pattern matching.

import java.util.regex.Pattern

object Main extends App {
  val text = "/oAuth.html?state=abcde&code=hfjksdhfrufhjjfkdjfkds"

  val statePatternString = """\/.*\?.*state=([^&\?]*)"""
  val statePattern = statePatternString.r
  val statePatternJ = Pattern.compile(statePatternString)

  val sj = statePatternJ.matcher(text)
  val sjMatch = if (sj.find()) sj.group(1) else ""
  println(s"Java match $sjMatch")

  val ss = statePattern.unapplySeq(text)
  println(s"Scala unapplySeq $ss")
  val sm = statePattern.findFirstIn(text)
  println(s"Scala findFirstIn $sm")

  text match {
    case statePattern(s) =>
      println(s"Scala matching $s")
    case _ =>
      println("Scala not matching")
  }

}

The app output is:

Java match abcde

Scala unapplySeq None

Scala findFirstIn Some(/oAuth.html?state=abcde)

Scala not matching

When using the extractor syntax val statePattern(se) = text the error is scala.MatchError.

What is causing the Scala regex unapplySeq to fail?

2
  • Can you please explain what are you trying to match? Commented Mar 22, 2016 at 13:11
  • It would be helpful if either the question and/or accepted answer could include "MatchError" somewhere in the content. The references to Java combined with no reference to the scala.MatchError exception generated when this happens makes it difficult to search out this entry when coming from a Scala only context. Here's the question I ended up generating as a result of failing to find this. stackoverflow.com/q/66392638/501113 Commented Mar 1, 2021 at 0:23

1 Answer 1

8

When you define a Scala pattern, it is anchored by default (=requires a full string match), while your Java sj.find() is looking for a match anywhere inside the string. Add .unanchored for the Scala regex to also allow partial matches:

val statePattern = statePatternString.r.unanchored
                                       ^^^^^^^^^^^

See IDEONE demo

Some UnanchoredRegex reference:

def unanchored: UnanchoredRegex

Create a new Regex with the same pattern, but no requirement that the entire String matches in extractor patterns.

Normally, matching on date behaves as though the pattern were enclosed in anchors, ^pattern$.

The unanchored Regex behaves as though those anchors were removed.

Note that this method does not actually strip any matchers from the pattern.

AN ALTERNATIVE SOLUTION would mean adding the .* at the pattern end, but remember that a dot does not match a newline by default. If a solution should be generic, the (?s) DOTALL modifier should be specified at the beginning of the pattern to make sure the whole string with potential newline sequences is matched.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot, that makes sense. An alternative solution (which I prefer in my case) it to provide a .* at the end to match the rest of the string.
Yes, it is almost equivalent. If you use .*, do not forget to add (?s) DOTALL modifier at the pattern start, so that . could match a newline (in case the string has newline symbols). Then, it will be a generic solution. I added this note to the answer.
In my case I am matching against URL, therefore no newlines for me, but it is good to mention it for completeness.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.