Replace text in HTML within multiple tags using Jsoup in Java

Question

I am reading a HTML file line by line using java. Consider i am having a HTML line

<p> Hi everyone. This is a <em>dead end.</em> Do not go!</p>

I want to change the text in the line to

<p> Hi everyone. This is not a <em>dead end.</em>You may go!</p>

The Inputs will be given as

Change From: This is a dead end. Do not go!
Change To: This is not a dead end. You may go!

How can i do this without disturbing the HTML tags using Jsoup in Java or any other methods in java. Please help

no using java, i am reading each line in a html file and trying to replace the text alone without disturbing the html tags — praveenrsmart
– praveenrsmart, Commented Aug 1, 2014 at 5:36
@Mike'Pomax'Kamermans , no one person asked question in comment, i answered for that but he deleted. i edited in the post too. — praveenrsmart
– praveenrsmart, Commented Aug 1, 2014 at 6:11
This is not related to JSOUP or other, as JSOUP is not meant for it. Whatever you want to do is just a Text replacement. To do the same Make simple rules in form of properties or xml which will contain the details what needs to be replaced by what and use java string replaceAll method to do the same. and yes it will never disturb the HTML tag. — prashant thakre
– prashant thakre, Commented Aug 1, 2014 at 6:27
can you give me a example? How can replaceAll will replace a text which contains a tag in between the text? — praveenrsmart
– praveenrsmart, Commented Aug 1, 2014 at 7:28

ollo · Accepted Answer · 2014-08-04 20:52:51Z

2

As an alternative to MCL's solution, here a fully Jsoup based one:

First, here's how Jsoup see's your html:

org.jsoup.nodes.TextNode:    Hi everyone. This is a 
org.jsoup.nodes.Element:    <em>dead end.</em>
org.jsoup.nodes.TextNode:    Do not go!

All three nodes are children of the <p>...</p> element.

And here's the (very verbose) code:

final String html = "<p> Hi everyone. This is a <em>dead end.</em> Do not go!</p>";

Document doc = Jsoup.parseBodyFragment(html); // Parse html into a document
Element pTag = doc.select("p").first(); // Select the p-element (there's just one)


// Text before 'em'-tag
TextNode preEM = (TextNode) pTag.childNode(0);
preEM.text(preEM.text().replace("This is a", "This is not a"));

// Text after 'em'-tag
TextNode postEM = (TextNode) pTag.childNode(2);
postEM.text("You may go!");


System.out.println(pTag); // Print result

Output:

<p> Hi everyone. This is not a <em>dead end.</em>You may go!</p>

This will keep all html formatting and / or will work in full documents.

answered Aug 4, 2014 at 20:52

ollo

25.5k15 gold badges112 silver badges158 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

praveenrsmart Over a year ago

the text and tags keep changing we don't know wat text or tag will be in the line, for a example i gave you a line, i want a generic method to replace the text not a specific one for this example alone.

ollo Over a year ago

Then you have to use Regex - Jsoup can't help you here very much.

Phuong Over a year ago

Thanks ollo for explain! :) @praveenrsmart : You should add for ollo 1 point :D

Collectives™ on Stack Overflow

Replace text in HTML within multiple tags using Jsoup in Java

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related