0

I would like to use Java regex to match a domain of a url, for example, for www.table.google.com, I would like to get 'google' out of the url, namely, the second last word in this URL string.

Any help will be appreciated !!!

4
  • What about google.co.nz, google.com.au, goo.gl? Commented Aug 14, 2017 at 22:43
  • If you already know that you need to get the string from second last word then you can using some utils function of String to get it. Using lastIndexOf(".") to remove part .com then using it again to get part google Commented Aug 14, 2017 at 22:44
  • It depends on the complexity of your inputs... Here is a pretty simple regex: .+\\.(.+)\\..+, here are some examples for that pattern: regex101.com/r/L52oz6/1. But why reinventing the wheel, there are plenty of really good libraries that correctly parse any complex url. But sure, for simple inputs a small regex is easily build. Commented Aug 14, 2017 at 22:48
  • Errr actually I am trying to get the second last word no matter what it is. So for the example google.co.nz that would be 'co'. I can only write in the forms of java regex, I couldn't use any code since it is used for some plug-in, they only accept java regex Commented Aug 14, 2017 at 22:49

3 Answers 3

1

It really depends on the complexity of your inputs...

Here is a pretty simple regex:

.+\\.(.+)\\..+

It fetches something that is inside dots \\..

And here are some examples for that pattern: https://regex101.com/r/L52oz6/1. As you can see, it works for simple inputs but not for complex urls.

But why reinventing the wheel, there are plenty of really good libraries that correctly parse any complex url. But sure, for simple inputs a small regex is easily build. So if that does not solve the problem for your inputs then please callback, I will adjust the regex pattern then.


Note that you can also just use simple splitting like:

String[] elements = input.split("\\.");
String secondToLastElement = elements[elements.length - 2];

But don't forget the index-bound checking.


Or if you search for a very quick solution than walk through the input starting from the last position. Work your way through until you found the first dot, continue until the second dot was found. Then extract that part with input.substring(index1, index2);.

There is also already a delegate method for exactly that purpose, namely String#lastIndexOf (see the documentation).

Take a look at this code snippet:

String input = ...
int indexLastDot = input.lastIndexOf('.');
int indexSecondToLastDot = input.lastIndexOf('.', indexLastDot);
String secondToLastWord = input.substring(indexLastDot, indexSecondToLastDot);

Maybe the bounds are off by 1, haven't tested the code, but you get the idea. Also don't forget bound checking.

The advantage of this approach is that it is really fast, it can directly work on the internal structures of Strings without creating copies.

Sign up to request clarification or add additional context in comments.

Comments

1

My attempt:

(?<scheme>https?:\/\/)?(?<subdomain>\S*?)(?<domainword>[^.\s]+)(?<tld>\.[a-z]+|\.[a-z]{2,3}\.[a-z]{2,3})(?=\/|$)

Demo. Works correctly for:

http://www.foo.stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.com/
http://stackoverflow.com
https://www.stackoverflow.com
www.stackoverflow.com
stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.co.uk
foo.www.stackoverflow.com
foo.www.stackoverflow.co.uk
foo.www.stackoverflow.co.uk/a/b/c

Comments

0
private static final Pattern URL_MATCH_GET_SECOND_AND_LAST = 
        Pattern.compile("www.(.*)//.google.(.*)", Pattern.CASE_INSENSITIVE);

String sURL = "www.table.google.com";

if (URL_MATCH_GET_SECOND_AND_LAST.matcher(sURL).find()){

    Matcher matchURL =  URL_MATCH_GET_SECOND_AND_LAST .matcher(sURL);

    if (matchURL .find()) {
        String sFirst = matchURL.group(1);
        String sSecond= matchURL.group(2);
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.