How to replace a continuous occurrence of a substring with a single substring?

Question

I have an html string parsed in android froom a spannable string. :-

<p dir="ltr"><b><b><b><b><b>qwert</b></b></b></b></b><b><b><b><b><b><b>y</b></b></b></b></b></b></p>

As you can see, there are multiple occurences of tags.

Now i have done hit and trials ,user methods like replaceAll(), but they replace all occurences.

What i want is that, when i pass a substring to find, lets say "", and then it should replace, lets say the first five consecutive bold tags in the above string with a single "" tag.

Any Suggestions

Required Result :- qwerty

Link does not work. I have no issue with android to html parsing. Its just that i want to process this above string and remove duplicates — Rahul Gupta
– Rahul Gupta, Commented Mar 21, 2014 at 6:09
What is the output you'd like to get from your sample input? What is the regex you're currently using? — Jerry
– Jerry, Commented Mar 21, 2014 at 6:13
I am not familiar with Matcher Class. Please see my edit. I have updated my question — Rahul Gupta
– Rahul Gupta, Commented Mar 21, 2014 at 6:16

Jerry · Accepted Answer · 2014-03-21 06:18:48Z

5

If I understand your problem correctly, you can try this regex then:

(<[^>]+>)\\1+

And replace with:

\\1

In code...

String test = "<p dir=\"ltr\"><b><b><b><b><b>qwert</b></b></b></b></b><b><b><b><b><b><b>y</b></b></b></b></b></b></p>";
String out = test.replaceAll("(<[^>]+>)\\1+", "$1");

Output:

<p dir="ltr"><b>qwert</b><b>y</b></p>

(<[^>]+>) matches and catches in group 1, the first tag that it finds.

\\1 in the regex refers to the first captured tag. The + indicates unlimited repetition (well, the limit is a big number I don't think you need to worry about).

The replacement $1 then also refers to the first captured tag.

ideone demo

answered Mar 21, 2014 at 6:18

Jerry

71.8k14 gold badges106 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Rahul Gupta Over a year ago

I am new to this pattern thing. Your code works fine above. Can you explain me the process and all those square bracket meanings in the above patter

Jerry Over a year ago

Okay, < and > mean these symbols themselves. [^>]+ is a character class. It means any character except >, repeated at least once. If I had [^a]+, that would mean any character except a, repeated at least once. Does that help? Is there more you want to ask about?

Rahul Gupta Over a year ago

Yes. Thanks, If my string has this :- . Can i pattern match alternate "" and replace them ?

Jerry Over a year ago

@RahulGupta That could be a problem... which (if it works), will make your example input become: qwerty and I'm not sure that's something you want.

aelor · Accepted Answer · 2014-03-21 06:22:43Z

2

you want somehting like this

find : ()\1+|(<\/b>)\2+

replace: \1\2

demo here : http://regex101.com/r/aC6iP4

edited Mar 21, 2014 at 6:22

answered Mar 21, 2014 at 6:12

aelor

11.2k3 gold badges35 silver badges48 bronze badges

Collectives™ on Stack Overflow

How to replace a continuous occurrence of a substring with a single substring?

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related