4

I am trying to read a HTML file and add link to some of the texts :

for example : I want to add link to "Campaign0" text. :

<td><p style="overflow: hidden; text-indent: 0px; "><span style="font-family: SansSerif;">101</span></p></td> 
<td><p style="overflow: hidden; text-indent: 0px; "><span style="font-family: SansSerif;">Campaign0</span>
<td><p style="overflow: hidden; text-indent: 0px; "><span style="font-family: SansSerif;">unknown</span></p></td>

Link to be added:

<a href="Second.html">

I need a JAVA program that modify html to add hyperlink over "Campaign0" .

How i do this with Jsoup ?

I tried this with JSoup :

        File input = new File("D://First.html");
        Document doc = Jsoup.parse(input, "UTF-8", "");
        Element span = doc.select("span").first(); <-- this is only for first span tag :(
        span.wrap("<a href="Second.html"></a>");

Is this correct ?? It's not working :(

In short : is there anything like-->

 if find <span>Campaign0</span> 
 then replace by <span><a href="">Campaign0</a></span> 

using JSoup or any technology inside JAVA code??

8
  • 1
    text = text.replaceAll("<span style='font-family: SansSerif; color: #000000; font-size: 10px; line-height: normal;'>Campaign0</span>","<a href='second.html'><span style='font-family: SansSerif; color: #000000; font-size: 10px; line-height: normal;'>Campaign0</span></a>"); Did you try this? Commented Jan 9, 2015 at 12:11
  • 1
    Check this: [stackoverflow.com/questions/13541460/… [1]: stackoverflow.com/questions/13541460/… Commented Jan 9, 2015 at 12:20
  • span.text().replaceAll("Campaign0","<a href='second.html'>Campaign0</a>");span.text().replaceAll("<span style='font-family: SansSerif; color: #000000; font-size: 10px; line-height: normal;'>Campaign0</span>","<a href='second.html'><span style='font-family: SansSerif; color: #000000; font-size: 10px; line-height: normal;'>Campaign0</span></a>"); Not Working :( no changes appears. Commented Jan 9, 2015 at 12:36
  • Have you checked out jsoup? Commented Jan 9, 2015 at 14:17
  • @Ascalonian well as you can see I am using Jsoup :P Commented Jan 9, 2015 at 14:26

1 Answer 1

3

Your code seems pretty much correct. To find the span elements with "Campaign0", "Campaign1", etc., you can use the JSoup selector "span:containsOwn(Campaign0)". See additional documentation for JSoup selectors at jsoup.org.

After finding the elements and wrapping them with the link, calling doc.html() should return the modified HTML code. Here's a working sample:

input.html:

<table>
    <tr>
        <td><p><span>101</span></p></td>
        <td><p><span>Campaign0</span></p></td>
        <td><p><span>unknown</span></p></td>
    </tr>
    <tr>
        <td><p><span>101</span></p></td>
        <td><p><span>Campaign1</span></p></td>
        <td><p><span>unknown</span></p></td>
    </tr>
</table>

Code:

    File input = new File("input.html");
    Document doc = Jsoup.parse(input, "UTF-8", "");
    Element span = doc.select("span:containsOwn(Campaign0)").first();
    span.wrap("<a href=\"First.html\"></a>");
    span = doc.select("span:containsOwn(Campaign1)").first();
    span.wrap("<a href=\"Second.html\"></a>");
    String html = doc.html();
    BufferedWriter htmlWriter =
            new BufferedWriter(new OutputStreamWriter(new FileOutputStream("output.html"), "UTF-8"));
    htmlWriter.write(html);
    htmlWriter.close();

output:

<html>
 <head></head>
 <body>
  <table> 
   <tbody>
    <tr> 
     <td><p><span>101</span></p></td> 
     <td><p><a href="First.html"><span>Campaign0</span></a></p></td> 
     <td><p><span>unknown</span></p></td> 
    </tr> 
    <tr> 
     <td><p><span>101</span></p></td> 
     <td><p><a href="Second.html"><span>Campaign1</span></a></p></td> 
     <td><p><span>unknown</span></p></td> 
    </tr> 
   </tbody>
  </table>
 </body>
</html>
Sign up to request clarification or add additional context in comments.

8 Comments

hey thanks for reply but i think your will work only with first <span> encountered. In my case this is not the first <span> :(
Which span exaclty do you want to add the link to? The first one with content "Campaign0", or some other rule?
Actually there are many span like "Campaign0","Campaign1","Campaign2","Campaign3" .. etc all over the html file. Different will have different html file hyperlink.
So the problem is which span elements to select for modification. Can you expand your input example to contain the surrounding <table> element and another span that needs to be modified? Could the selection rule be something like "span element inside second td element of each tr element inside the table"?
Finding <span>Campaign0</span> would be easy using selector like doc.select("span:containsOwn(Campaign0)"). You said the content of the span can be different, like "Campaign1" - do you know all the possible values the span can contain? If you have a list of "Campaign0", "Campaign1", ..., you could iterate through the list and select each span using span:containsOwn(Campaign0) etc.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.