2

i am looking for an html parser that can search and replace the anchor tags like

ex
<a href="/ima/index.php">example</a>
to
<a href="http://www.example.com/ima/index.php">example</a>

UPDATED:

my code with jsoup but not working

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import com.google.common.collect.ImmutableList;
import com.google.common.net.InternetDomainName;

public class test {
    public static void main(String args[]) throws IOException {

          Document doc = Jsoup.connect("http://www.google.com").get();

          String html =doc.outerHtml().toString();

         // System.out.println(html);

           Elements links = doc.select("a");



            for (Element link : links) {
             String href=link.attr("href");
             if(href.startsWith("http://"))
             {

             }
             else
             {
                 html.replaceAll(href,"http://www.google.com"+href);
             }
            }
            System.out.println(html);
    }

}
3
  • Couldn't you just use a <BASE HREF='http://www.example.com/'> to achieve this result? Or are you looking to override the contents of a site? Commented Jan 30, 2011 at 19:14
  • ya that can do that..sorry for silly question Commented Jan 30, 2011 at 19:21
  • Not silly, just over-engineered. :) Commented Jan 30, 2011 at 19:33

4 Answers 4

5

this code changes relative links in document to absolute links the code uses jsoup library

private void absoluteLinks(Document document, String baseUri)    {
    Elements links = document.select("a[href]");
    for (Element link : links)  {
        if (!link.attr("href").toLowerCase().startsWith("http://"))    {
            link.attr("href", baseUri+link.attr("href"));
        }
    }
}
Sign up to request clarification or add additional context in comments.

Comments

2
package javaapplication4;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

/**
 *
 * @author derek
 */
public class Main
{
    /**
     * @param args the command line arguments
     */
    public static void main(String[] args)
    {
        try
        {
            Document document = Jsoup.connect("http://www.google.com").get();
            Elements elements = document.select("a");

            for (Element element : elements)
            {
                element.baseUri();
            }
            System.out.println(document);
        }
        catch (Exception e)
        {
            e.printStackTrace(System.err);
        }
    }
}

Comments

1

You could do this with String.replaceAll() and a regexp that matched on

<a href="/

to find all relative links.

html = html.replaceAll("<a href=\"/", "<a href=\"http://www.google.com/\"");

Comments

0

Is this a programming question? If you're looking for a pre-made Java file or something to do this, you're in the wrong place. If you're looking to write something like this, then you could just search for instances of text that begins with a href=/" and ends with /">, and then you could just check the href value, and if it's a relative path (that is, starts with /), you can just add the other text to the beginning.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.