Regex for specific url format

Question

I am trying to get a regex expression to match a specific url format. Specifically the api urls for stackexchange. For example I want both of these to match:

http://api.stackoverflow.com/1.1/questions/1234/answers  
http://api.physics.stackexchange.com/1.0/questions/5678/answers

Where

everything not in bold must identical.
The first bold part, can only be made of a to z, and either one or no full stop.
- Also it would be good, if there is one full stop the word "stackexchange" must follow. However this isn't crucial.
The second bold part can only be a 1 or a 0.
The last bold part can be only numbers 0 to 9, and can be any length
There can't be anything at all before or after the url, not even a trailing slash

Mike Samuel · Accepted Answer · 2011-07-02 22:23:48Z

5

Pattern.compile("^(?i:http://api\\.(?:[a-z]+(?:\\.stackexchange)?)\\.com)/1\\.[01]/questions/[0-9]+/answers\\z")

The ^ makes sure it starts at the start of input, and the \\z makes sure it ends at the end of input. All the dots are escaped so they are literal. The (?i:...) part makes the domain and scheme case-insensitive as per the URL spec. The [01] only matches the characters 0 or 1. The [0-9]+ matches 1 or more Arabic digits. The rest is self explanatory.

answered Jul 2, 2011 at 22:23

Mike Samuel

121k30 gold badges230 silver badges255 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

trutheality · Accepted Answer · 2011-07-02 22:29:08Z

1

^http://api[.][a-z]+([.]stackexchange)?[.]com/1[.][01]/questions/[0-9]+/answers$

^ matches start-of-string, $ matches end-of-line, [.] is an alternative way to escape the dot than a backslash (which itself would need to be escaped as \\.).

edited Jul 2, 2011 at 22:29

answered Jul 2, 2011 at 22:21

trutheality

23.5k6 gold badges58 silver badges68 bronze badges

4 Comments

Mike Samuel Over a year ago

$ in Java regex does not guarantee a match at the end. From download.oracle.com/javase/6/docs/api/java/util/regex/… . For example, Pattern.compile("foo$") will match "foo\n".

trutheality Over a year ago

Shouldn't make a difference in the OP's case, multiline URLs are a freaky thing to see.

Mike Samuel Over a year ago

you're right. Line separators are not allowed unescaped in URLs, but the OP does not make it clear whether the string has a priori been validated as a URL.

Jonathan. Over a year ago

The reason it needs to be strict is because it also being used as a tag to identify another object. Some objects will have the same tag, and they need to be indentical else they wont group correctly, in other words of I need to get all objects with a specific URL and some objects have a break line on the end or a trailing slash for some reason they won't be included.

ridgerunner · Accepted Answer · 2011-07-03 05:15:42Z

This tested Java program has a commented regex which should do the trick:

import java.util.regex.*;
public class TEST {
    public static void main(String[] args) {
        String s = "http://api.stackoverflow.com/1.1/questions/1234/answers";

        Pattern p = Pattern.compile(
            "http://api\\.              # Scheme and api subdomain.\n" +
            "(?:                        # Group for domain alternatives.\n" +
            "  stackoverflow            # Either one\n" +
            "| physics\\.stackexchange  # or the other\n" +
            ")                          # End group for domain alternatives.\n" +
            "\\.com                     # TLD\n" +
            "/1\\.[01]                  # Either 1.0 or 1.1\n" +
            "/questions/\\d+/answers    # Rest of path.", 
            Pattern.COMMENTS);
        Matcher m = p.matcher(s);
        if (m.matches()) {
            System.out.print("Match found.\n");
        } else {
            System.out.print("No match found.\n");
        }
    }
}

Collectives™ on Stack Overflow

Regex for specific url format

3 Answers 3

Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related