I have an HTML document in .txt format containing multiple tables and other texts and I am trying to delete any HTML (anything within "<>") if it's inside a table (between <table> and </table>). For example:
===================
other text
<other HTML>
<table>
<b><u><i>bold underlined italic text</b></u></i>
</table>
other text
<other HTML>
==============
The final output would be as the following. Note that only HTML within and are removed.
==============
other text
<other HTML>
<table>
bold underlined italic text
</table>
other text
<other HTML>
=============
Any help is greatly appreciated!
<html><and>is a tag, even if it's not valid HTML. So a mathematical expression likex<y and z>2could cause problems. If you can state a bunch of assumptions we can follow, then someone can likely provide a satisfactory regex. But it's probably better not to use regexes at all as zzzzBov suggests.