I have a FTP URL, and have to parse the URL to get the username, password, server name and the directory. What can be the regular exp to do it?
2 Answers
Use java.net.URI. It will be more robust, and will probably be faster.
The problems with using a Regex include:
either too simple to deal with edge cases, or too complicated / expensive because it deals with those cases, and
it is unlikely to handle %-encoding correctly.
For example, the (original) regex tendered by @Larry doesn't deal with cases where the URL doesn't have userInfo, etcetera.
As the comments stated, a URL is a URI but not (necessarily) vice-versa. But the reasons that I recommend java.net.URI not java.net.URL are:
- it has a better parser, and
- it has a better API for examining the parts of the parsed url.
3 Comments
Whenever I think of regexes, I think "Perl" and write a quick and dirty pattern (qr{xxx}x) and test it against test input.
In your case, assuming that user name, password, server, and directory name all need to be parsed out (and are mandatory), I'd use the following. Add question marks for "optional" parts of your pattern if you wish to modify this:
qr{
^ # Start of text
ftp: # Protocol
// # Double slash
([^:]+) # $1 = User Name
: # Colon
([^@]+) # $2 = Password
@ # AT sign
(.*?) # $3 = Server name
/ # Single slash
(.*?) # $4 = Directory name
(\?.*)? # Question mark ends URI
$ # End of text
}x;
Now that we have the pattern, simply double the backslash (in the "Question mark" portion), remove spaces and comments (if you wish), and place into a Java String:
"^ftp://([^:]+):([^@]+)@(.*?)/(.*?)(\\?.*)?$";
Use that with Pattern/Matcher and you should be able to extract things nicely.