Is it possible to split string into lexems somehow like
"[email protected]" match {
case name :: "@" :: domain :: "." :: zone => doSmth(name, domain, zone)
}
In other words, on the same manner as lists...
Is it possible to split string into lexems somehow like
"[email protected]" match {
case name :: "@" :: domain :: "." :: zone => doSmth(name, domain, zone)
}
In other words, on the same manner as lists...
Yes, you can do this with Scala's Regex functionality.
I found an email regex on this site, feel free to use another one if this doesn't suit you:
[-0-9a-zA-Z.+_]+@[-0-9a-zA-Z.+_]+\.[a-zA-Z]{2,4}
The first thing we have to do is add parentheses around groups:
([-0-9a-zA-Z.+_]+)@([-0-9a-zA-Z.+_]+)\.([a-zA-Z]{2,4})
With this we have three groups: the part before the @, between @ and ., and finally the TLD.
Now we can create a Scala regex from it and then use Scala's pattern matching unapply to get the groups from the Regex bound to variables:
val Email = """([-0-9a-zA-Z.+_]+)@([-0-9a-zA-Z.+_]+)\.([a-zA-Z]{2,4})""".r
Email: scala.util.matching.Regex = ([-0-9a-zA-Z.+_]+)@([-0-9a-zA-Z.+_]+)\.([a-zA-Z] {2,4})
"[email protected]" match {
case Email(name, domain, zone) =>
println(name)
println(domain)
println(zone)
}
// user
// domain
// com
groups object and find the right match by index or name. Good answer.Starting Scala 2.13, it's possible to pattern match a Strings by unapplying a string interpolator:
val s"$user@$domain.$zone" = "[email protected]"
// user: String = "user"
// domain: String = "domain"
// zone: String = "com"
If you are expecting malformed inputs, you can also use a match statement:
"[email protected]" match {
case s"$user@$domain.$zone" => Some(user, domain, zone)
case _ => None
}
// Option[(String, String, String)] = Some(("user", "domain", "com"))
In general regex is horribly inefficient, so wouldn't advise.
You CAN do it using Scala pattern matching by calling .toList on your string to turn it into List[Char]. Then your parts name, domain and zone will also be List[Char], to turn them back into Strings use .mkString. Though I'm not sure how efficient this is.
I have benchmarked using basic string operations (like substring, indexOf, etc) for various use cases vs regex and regex is usually an order or two slower. And of course regex is hideously unreadible.
UPDATE: The best thing to do is to use Parsers, either the native Scala ones, or Parboiled2
StringOps) is generally 100 to 1000 times faster. Obviously if a regex is built badly the speed difference can be even greater (lookup backtracking regular-expressions.info/catastrophic.html). My own benchmarks are generally performed on real data consisting of billions of records.StringOps of course is too low level to maintain code with... not likely a good option for more than those cases where a specific method fits your small ad-hoc need. Care to elaborate here and/or in the answer?zeroOrMore or +, which improves readability, (b) static full typing, e.g. + gives a List which you can then map, tranform etc, also means IDEs will syntax highlight it (c) parboiled2 is part macro based, and compiled, so can be substantially faster than regex. github.com/sirthias/parboiled2#example
::case class, aka "cons" operator, builds a list out of elements. What you need is a case class which accepts two lists and concatenates them, much like the:::operator (but unfortunately there is not a:::case class as with cons).