0

I'm newbie to Java, I want to get all of the URL in the text below

WEBSITE1 https://localhost:8080/admin/index.php?page=home
WEBSITE2 https://192.168.0.3:8084/index.php
WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home
WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum

the result that I want is:

https://localhost:8080
https://192.168.0.3:8084
https://192.168.0.5
https://192.168.0.1:8080

I want to store it into the Linked List or Array too. Can somebody teach me? Thank You

4
  • 4
    Have you tried anything? Commented Jun 20, 2013 at 13:23
  • stackoverflow.com/questions/163360/… Commented Jun 20, 2013 at 13:26
  • @juniperi VERY bad idea, when you have URI Commented Jun 20, 2013 at 13:29
  • @fge Second answer uses URI Commented Jun 20, 2013 at 13:30

5 Answers 5

1

This is how you can do this. I did one for you and you do the rest :)

try {
            ArrayList<String> urls = new ArrayList<String>();
            URL aURL = new URL("https://localhost:8080/admin/index.php?page=home");
             System.out.println("protocol = " + aURL.getProtocol()+aURL.getHost()+aURL.getPort());
             urls.add(aURL.getProtocol()+aURL.getHost()+aURL.getPort());
        } catch (MalformedURLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
Sign up to request clarification or add additional context in comments.

Comments

0

Use a simple regexp to locate what's starting with https?:// and then just extract this until the first /

Matcher m = Pattern.compile("(https?://[^/]+)").matcher(//
        "WEBSITE1 https://localhost:8080/admin/index.php?page=home\r\n" + //
        "WEBSITE2 https://192.168.0.3:8084/index.php\r\n" + //
        "WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home\r\n" + //
        "WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum");
List<String> urls = new ArrayList<String>();
while (m.find()) {
    urls.add(m.group(1));
}
System.out.println(urls);

Now if you do want to get only the WEBSITE. part you will only have to change the regular expression "(https?://[^/]+)" with the following one: "(.*?)\\s+https?". The rest of the code stays untouched.

1 Comment

thank you, and what if I want to get the list of website name? for example, the result that I want: website1 website2 website3 website4 thank you agan :-)
0

Let's say the line represents a single line (probably in a loop):

//get the index of "https" in the string
int indexOfHTTPS= line.indexOf("https://");
//get the index of the first "/" after the "https"
int indexOfFirstSlashAfterHTTPS= line.indexOf("/", indexOfHTTPS + "https://".length());

//take a string between "https" and the first "/"
String url = line.substring(indexOfHTTPS, indexOfFirstSlashAfterHTTPS);

Later on, add this url to an ArrayList<String>:

ArrayList<String> urlList= new ArrayList<String>();
urlList.add(url);

Comments

0

You can do it with the help of URL class.

 public static void main(String[] args) throws MalformedURLException { 

        String string ="https://192.168.0.5:9090/controller/index.php?page=home";
        URL url= new URL(string);
        String result ="https://"+url.getHost()+":"+url.getPort();
        System.out.println(result);
    }

Output :https://192.168.0.5:9090

1 Comment

That won't grab the port -- hence the interest of using URI instead. What is more, URI will never attempt to resolve hostnames
0

You could either try to find the index of the protocol substring ("http[s]") in the Strings, or use a simple Pattern (only for matching the "website[0-9]" head, not to apply to the URLs).

Here's a solution with the Pattern.

String webSite1 = "WEBSITE1 https://localhost:8080/admin/index.php?page=home";
String webSite2 = "WEBSITE2 https://192.168.0.3:8084/index.php";
String webSite3 = "WEBSITE3 https://192.168.0.5:9090/controller/index.php?page=home";
String webSite4 = "WEBSITE4 https://192.168.0.1:8080/home/index.php?page=forum";
ArrayList<URI> uris = new ArrayList<URI>();
Pattern pattern = Pattern.compile("^website\\d+\\s+?(.+)", Pattern.CASE_INSENSITIVE);
Matcher matcher;
matcher = pattern.matcher(webSite1);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
matcher = pattern.matcher(webSite2);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
matcher = pattern.matcher(webSite3);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
matcher = pattern.matcher(webSite4);
if (matcher.find()) {
    try {
        uris.add(new URI(matcher.group(1)));
    }
    catch (URISyntaxException use) {
        use.printStackTrace();
    }
}
System.out.println(uris);

Output:

[https://localhost:8080/admin/index.php?page=home, https://192.168.0.3:8084/index.php, https://192.168.0.5:9090/controller/index.php?page=home, https://192.168.0.1:8080/home/index.php?page=forum]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.