0

I have a FTP URL, and have to parse the URL to get the username, password, server name and the directory. What can be the regular exp to do it?

ex: ftp://userName:password@someServer/direcory-name

0

2 Answers 2

6

Use java.net.URI. It will be more robust, and will probably be faster.

The problems with using a Regex include:

  • either too simple to deal with edge cases, or too complicated / expensive because it deals with those cases, and

  • it is unlikely to handle %-encoding correctly.

For example, the (original) regex tendered by @Larry doesn't deal with cases where the URL doesn't have userInfo, etcetera.


As the comments stated, a URL is a URI but not (necessarily) vice-versa. But the reasons that I recommend java.net.URI not java.net.URL are:

  • it has a better parser, and
  • it has a better API for examining the parts of the parsed url.
Sign up to request clarification or add additional context in comments.

3 Comments

Is URI preferable to URL, or are they about the same? If preferable, how/why?
URLs (universal resource locators) are a subset of URIs (universal resource identifiers). Examples of URLs are URIs that use one of the following schemes ("http://", "ftp://", and "mailto:"). In this case, the FTP link is a URL and a URI.
@Ed - A URL is a URI, but the reverse is not always true. (A URI could be a URN). In java, the URI class is more often preferable to URL class since equality of URLs relies upon DNS resolution (ref: Effective Java ), but choice could depend upon what APIs are being used. (Admittedly conversion is relatively easy). Make sense?
1

Whenever I think of regexes, I think "Perl" and write a quick and dirty pattern (qr{xxx}x) and test it against test input.

In your case, assuming that user name, password, server, and directory name all need to be parsed out (and are mandatory), I'd use the following. Add question marks for "optional" parts of your pattern if you wish to modify this:

qr{
    ^           # Start of text
    ftp:        # Protocol
    //          # Double slash
    ([^:]+)     # $1 = User Name
    :           # Colon
    ([^@]+)     # $2 = Password
    @           # AT sign
    (.*?)       # $3 = Server name
    /           # Single slash
    (.*?)       # $4 = Directory name
    (\?.*)?     # Question mark ends URI
    $           # End of text
}x;

Now that we have the pattern, simply double the backslash (in the "Question mark" portion), remove spaces and comments (if you wish), and place into a Java String:

"^ftp://([^:]+):([^@]+)@(.*?)/(.*?)(\\?.*)?$";

Use that with Pattern/Matcher and you should be able to extract things nicely.

1 Comment

This regex has problems if the URL is not exactly as expected; e.g. it fails if the optional userInfo is not present, or if there's a user name but no ":password", or if there's a fragment, or ...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.