0

I am reading a HTML file line by line using java. Consider i am having a HTML line

<p> Hi everyone. This is a <em>dead end.</em> Do not go!</p>

I want to change the text in the line to

<p> Hi everyone. This is not a <em>dead end.</em>You may go!</p>

The Inputs will be given as

  • Change From: This is a dead end. Do not go!
  • Change To: This is not a dead end. You may go!

How can i do this without disturbing the HTML tags using Jsoup in Java or any other methods in java. Please help

6
  • no using java, i am reading each line in a html file and trying to replace the text alone without disturbing the html tags Commented Aug 1, 2014 at 5:36
  • why are you commenting, just edit your post. Commented Aug 1, 2014 at 6:06
  • @Mike'Pomax'Kamermans , no one person asked question in comment, i answered for that but he deleted. i edited in the post too. Commented Aug 1, 2014 at 6:11
  • This is not related to JSOUP or other, as JSOUP is not meant for it. Whatever you want to do is just a Text replacement. To do the same Make simple rules in form of properties or xml which will contain the details what needs to be replaced by what and use java string replaceAll method to do the same. and yes it will never disturb the HTML tag. Commented Aug 1, 2014 at 6:27
  • can you give me a example? How can replaceAll will replace a text which contains a tag in between the text? Commented Aug 1, 2014 at 7:28

1 Answer 1

2

As an alternative to MCL's solution, here a fully Jsoup based one:

First, here's how Jsoup see's your html:

org.jsoup.nodes.TextNode:    Hi everyone. This is a 
org.jsoup.nodes.Element:    <em>dead end.</em>
org.jsoup.nodes.TextNode:    Do not go!

All three nodes are children of the <p>...</p> element.

And here's the (very verbose) code:

final String html = "<p> Hi everyone. This is a <em>dead end.</em> Do not go!</p>";

Document doc = Jsoup.parseBodyFragment(html); // Parse html into a document
Element pTag = doc.select("p").first(); // Select the p-element (there's just one)


// Text before 'em'-tag
TextNode preEM = (TextNode) pTag.childNode(0);
preEM.text(preEM.text().replace("This is a", "This is not a"));

// Text after 'em'-tag
TextNode postEM = (TextNode) pTag.childNode(2);
postEM.text("You may go!");


System.out.println(pTag); // Print result

Output:

<p> Hi everyone. This is not a <em>dead end.</em>You may go!</p>

This will keep all html formatting and / or will work in full documents.

Sign up to request clarification or add additional context in comments.

3 Comments

the text and tags keep changing we don't know wat text or tag will be in the line, for a example i gave you a line, i want a generic method to replace the text not a specific one for this example alone.
Then you have to use Regex - Jsoup can't help you here very much.
Thanks ollo for explain! :) @praveenrsmart : You should add for ollo 1 point :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.