11

Is it possible to split string into lexems somehow like

"[email protected]" match {
    case name :: "@" :: domain :: "." :: zone => doSmth(name, domain, zone)
}

In other words, on the same manner as lists...

1
  • I'm not sure if you can do it, but I can explain why your example doesn't work. Essentially what you have is a matcher for a list of Strings because the :: case class, aka "cons" operator, builds a list out of elements. What you need is a case class which accepts two lists and concatenates them, much like the ::: operator (but unfortunately there is not a ::: case class as with cons). Commented Jan 16, 2014 at 20:48

3 Answers 3

20

Yes, you can do this with Scala's Regex functionality.

I found an email regex on this site, feel free to use another one if this doesn't suit you:

[-0-9a-zA-Z.+_]+@[-0-9a-zA-Z.+_]+\.[a-zA-Z]{2,4}

The first thing we have to do is add parentheses around groups:

([-0-9a-zA-Z.+_]+)@([-0-9a-zA-Z.+_]+)\.([a-zA-Z]{2,4})

With this we have three groups: the part before the @, between @ and ., and finally the TLD.

Now we can create a Scala regex from it and then use Scala's pattern matching unapply to get the groups from the Regex bound to variables:

val Email = """([-0-9a-zA-Z.+_]+)@([-0-9a-zA-Z.+_]+)\.([a-zA-Z]{2,4})""".r
Email: scala.util.matching.Regex = ([-0-9a-zA-Z.+_]+)@([-0-9a-zA-Z.+_]+)\.([a-zA-Z]    {2,4})


"[email protected]" match {
    case Email(name, domain, zone) =>
       println(name)
       println(domain)
       println(zone)
}

// user
// domain
// com
Sign up to request clarification or add additional context in comments.

1 Comment

+1 This is easily one of the nicest ways I've seen a language handle regexes. Far nicer than the alternative in a lot of languages, forcing you to manually access a groups object and find the right match by index or name. Good answer.
6

Starting Scala 2.13, it's possible to pattern match a Strings by unapplying a string interpolator:

val s"$user@$domain.$zone" = "[email protected]"
// user: String = "user"
// domain: String = "domain"
// zone: String = "com"

If you are expecting malformed inputs, you can also use a match statement:

"[email protected]" match {
  case s"$user@$domain.$zone" => Some(user, domain, zone)
  case _                      => None
}
// Option[(String, String, String)] = Some(("user", "domain", "com"))

Comments

3

In general regex is horribly inefficient, so wouldn't advise.

You CAN do it using Scala pattern matching by calling .toList on your string to turn it into List[Char]. Then your parts name, domain and zone will also be List[Char], to turn them back into Strings use .mkString. Though I'm not sure how efficient this is.

I have benchmarked using basic string operations (like substring, indexOf, etc) for various use cases vs regex and regex is usually an order or two slower. And of course regex is hideously unreadible.

UPDATE: The best thing to do is to use Parsers, either the native Scala ones, or Parboiled2

5 Comments

Whaaat? Are you rebuilding the regex each time? The point of regexes is that they create 'machines' that can be matched against strings very efficiently. Also, using regexes for very small checks doesn't make sense, but for more complex matches, I would expect great efficiency savings, especially as the number if inputs passed through grows.
@KenoguLabz no the regex construction is outside the benchmark. When I last went to the Scala eXchange conference, I saw a talk that claimed using parsers (or native Scala StringOps) is generally 100 to 1000 times faster. Obviously if a regex is built badly the speed difference can be even greater (lookup backtracking regular-expressions.info/catastrophic.html). My own benchmarks are generally performed on real data consisting of billions of records.
@samthebest I am very much interested in this as I find regex as a language unmaintainable. How would scala Parsers be superior to compiled regex automatons? how certain are you at this, or can you elaborate at what scenarios? bare bones StringOps of course is too low level to maintain code with... not likely a good option for more than those cases where a specific method fits your small ad-hoc need. Care to elaborate here and/or in the answer?
@matt Three main benefits to using parsers are IMO (a) option to use either regex style syntax or full name version, e.g. zeroOrMore or +, which improves readability, (b) static full typing, e.g. + gives a List which you can then map, tranform etc, also means IDEs will syntax highlight it (c) parboiled2 is part macro based, and compiled, so can be substantially faster than regex. github.com/sirthias/parboiled2#example
Thanks, those advantages are clear to begin with, to me. But I find these to address the speed aspect: github.com/sirthias/… github.com/sirthias/…. If only they could be trusted, as I seem to recall seeing people say parboiled is slow.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.