2

I have a string :

0000000000<table blalba>blaalb<tr>gfdg<td>kgdfkg</td></tr>fkkkkk</table>5555

I want to replace the text between table and /table with : "", to delete this text to display only 00000000005555.

When it is on one line, it works:

chaineHtml = chaineHtml.replaceFirst("[^<title>](.*)[</title>$", "");

But the same with table fails.

0

6 Answers 6

3

This regex should work:

html = html.replaceAll("(?is)<table.+?/table>", "");

Where (?is) will make it match across multiple lines and ignore case.

But I suggest you should not manipulate HTML using regex as it can be error prone.

Sign up to request clarification or add additional context in comments.

Comments

0

try this

s = s.replaceAll("<table.+/table>", "");

3 Comments

Can you please explain this code (in your answer)? You might get more upvotes that way!
thanks a lot it 's VERY GOOD !!!!it's great it works perfectly, thank you also for your reply as fast!
@The Guy with The Hat it means replace text which starts with "<table" and ends with "/table>" with "" (remove actually)
0
 [^<table>]

I don't think that means what you think it means.

It is not "a string not equal to <table>". Rather, it means "a character not equal to < or t or a or b or l or e or >". "[^...]" is called a negative character class.

Change your regex to

 (.*?)<table>.*?</table>(.*?)

replace it with

$1$2

and it will give you the result you wish.


Please consider bookmarking The Stack Overflow Regular Expeession FAQ for future reference. The bottom section contains a list of online regex testers where you can try things out yourself. You may also want to check out the sections named "Character Classes" and, as mentioned by @anubhava: "General Information > Do not use regex to parse HTML"

2 Comments

Problem is that the sample regex don't go along with the sample input.
@BheshGurung: Meant "table". Fixed
0

Don't use regex if you are not familiar with its concepts!

There is a simple plain java solution for your problem:

String begin = "<table";
String end = "</table>";
String s = "0000000001<table blalba>blaalb<tr>gfdg<td>kgdfkg</td></tr>fkkkkk</table>4555";
int tableIndex = s.indexOf(begin);
int tableEndIndex = s.indexOf(end, tableIndex);

while (tableIndex > -1) {
    s = s.substring(0, tableIndex) + s.substring(tableEndIndex + end.length());
    tableIndex = s.indexOf("<table");
    tableEndIndex = s.indexOf("</table>", tableIndex);
}

Comments

0
String resultString = subjectString.replaceAll("<table.*?table>", "");

Explanation:

Match the characters “<table” literally «<table»
Match any single character that is not a line break character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the characters “table>” literally «table>»

1 Comment

One line answers are not good style. Please consider adding explanation, particularly so other readers will understand your answer.
-1

Here is a brilliant solution I found somewhere: Using the Regex

[\s\S]

to fit any character, including newlines because it fits any space or non-space characters. So in your case that would give:

s = s.replaceAll("<table[\\s\\S]+/table>", "");

the double backslashes are to escape the backslash.

Another possiblity is

(.|\n)

which is any character (except newline) or newline which gives:

s = s.replaceAll("<table(.|\n)+/table>", "");

For some reason, on my computer, in certain combinations, when I use (.|\n)+ regex runs into a weird loop and goes into a stackoverflow:

Exception in thread "main" java.lang.StackOverflowError at java.lang.Character.codePointAt(Character.java:4668) at java.util.regex.Pattern$CharProperty.match(Pattern.java:3693)

which doesn't happen when I use [\s\S\]+ instead. I have no idea why though.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.