2

I have tried the regex from this question : how to get domain name from URL

But the domain name is not being found. Here is my implementation :

    val Names = """.*([^\.]+)(com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk|__and so on__)$""".r
    val s = Names.findFirstIn("www.google.com")
    s match {
    case Some(name) =>
        println(name)
    case None =>
        println("No name value")
    }

"No name value" is consistently printed to std out. Is there an issue with the regex or my Scala implementation ?

3 Answers 3

2

I fixed the regex by adding a . before the extension. BTW, since you must get the group that interests you (the #1), you should use findFirstMatchIn instead of findFirstIn.

val Names = """([^.]+)\.(com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk|__and so on__)$""".r
val s = Names.findFirstMatchIn("www.google.com")
s match {
case Some(name) =>
  println(name)
  println(name.group(1))
case None =>
    println("No name value")
}

Prints:

google.com
google
Names: scala.util.matching.Regex = ([^.]+)\.(com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk|__and so on__)$
s: Option[scala.util.matching.Regex.Match] = Some(google.com)

EDITED: sorry I misread your question. I rewrote the answer.

Sign up to request clarification or add additional context in comments.

2 Comments

that regex does'nt work for me, 'wwww.google.com' is printed, 'google' should be printed
edited the answer (I read your question too quickly). This one should be ok :)
2

I would use Scalas 2.10 string interpolation feature:

implicit class Regex(sc: StringContext) {
  def r = new util.matching.Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}

scala> "www.google.co.uk" match {
      case  r"(.*?)$sld([^.]+)$domain\.(com|net|org|co\.uk)$tld" => (sld,domain,tld)
      case _ => ???
    }
res61: (String, String, String) = (www,google,co.uk)

The problem with this approach is that you always need to capture each group with a variable. To disable this, you need to add explicitly a non capturing group (starts with ?:):

r".*?([^.]+)$domain\.(?:com|net|org|co\.uk)"

For the first group it is also possible to leave it out completely.

It is also possible to leave out the not-matched part of the pattern match if you are sure that it is possible to always match the input strings:

scala> val r".*?([^.]+)$domain\.(?:com|net|org|co\.uk)" = "www.google.com"
domain: String = google

Comments

1
scala> val Names = """.*?([^\.]+)\.(?:com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk|__and so on__)""".r
Names: scala.util.matching.Regex = .*?([^\.]+)\.(?:com|net|org|info|coop|int|co\.uk|org\.uk|ac\.uk|uk|__and so on__)

scala> val Names( primary ) = "www.google.com"
primary: String = google

Changes:

  • Note the ? after the initial .* -- greedy matching can match all the way to e.com, so turn it off!
  • Add '.' between the group you want and the (com|net...) section. you expect dot to be a boundary there
  • you don't want the (com|net...) section to define a capturing group, so use (?:...) rather than just (...)
  • I removed the $ at the end. That was probably gratuitous.

Good luck!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.