I have a list of URLs that I need to verify are valid URLs. I've written a program in Java that uses Apache's HttpClient to check the link. I had to implement my own redirect strategy due to the presence of invalid characters (like {} in the redirect URLS) which the default stratgey didn't take care of. It works fine in the majority of the cases except for 2 of them:
Escaped Characters in the path or query params, which should not be encoded further. Example:
String url = "http://www.example.com/chapter1/%3Fref%3Dsomething%26term%3D?ref=xyz"If I use a URI object, it chokes on the "{" character.
URI myUri = new URI(url) ==> This will fail.If I run:
URI myUri = new URI(UriUtils.encodeHttpUrl(url))it encodes the %3F to %253F. However when I follow the link using Chrome or Fiddler, I do not see %3F getting escaped again. How do I protect from over-encoding the path or query params?
The last query param in the URL has a valid URL as well. Eg.
String url = "www.example.com/Chapter1/?param1=xyz¶m2=http://www.google.com/?abc=1"
My current encoding strategy splits up the query params and then calls URLEncoder.encode on the query params. This however causes the last param to be encoded as well (which is not the case when I follow it in Fiddler or Chrome).
I've tried a number of things (using UriUtils, special cases for URLs as last param and other hacks) but nothing seems to be ideal. Whats the best way to solve this?