Select and replace multiple lines in Notepad++ using regex

Question

I have a very large HTML file with the results of a security scan and I need to pull the useless information out of the document. An example of what I need to pull out looks something like this:

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=10395" target="_blank"> 10395</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Microsoft Windows SMB Shares Enumeration</span></td>
</tr>

After the edit the text above should just be removed. I can't do a standard find due to the variation though. Here is another example of what needs to be removed from the document:

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=11219" target="_blank"> 11219</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Nessus SYN scanner</span></td>
</tr>

I need to treat the ID number, 10395, as a variable, but the length stays the same. Also, "Microsoft Windows SMB Shares Enumeration" needs to be treated as a variable too, since it changes throughout the document.

I have tried throwing something like this into replace, but I think I am totally missing the mark.

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=\1\1\1\1\1" target="_blank"> \1\1\1\1\1</a>

Maybe I should be using a different tool altogether?

What are you trying to transform to what? What should the doc look like after the change? (and is this a line by line match and replace?) — Tezra
– Tezra, Commented Jun 16, 2017 at 17:07
@Tezra I am just trying to remove those snippets, so just replacing them with a space or a \n. It is 6 total lines at a time that would need to be replaced if I approach it the way I am currently thinking. — creigel
– creigel, Commented Jun 16, 2017 at 17:09
So you want to remove the display text portion? Can you please add the example of what it should look like after to the question? — Tezra
– Tezra, Commented Jun 16, 2017 at 17:12

revo · Accepted Answer · 2017-06-16 17:22:43Z

1

I assume by repeating \1 multiple times you mean a placeholder for a single character but that's not right. What you are trying to achieve is something like this:

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=(\d+)" target="_blank"> \1</a>

To match whole 6 lines:

<tr>\s*<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>\s*<td width="10%" valign="top" class="classcell"> <a href="http://www\.nessus\.org/plugins/index\.php\?view=single&amp;id=(\d+)" target="_blank"> \1</a>\s*</td>\s*<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">.*?</span></td>\s*</tr>

Then you can replace it with an empty string.

answered Jun 16, 2017 at 17:22

revo

49k15 gold badges84 silver badges123 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

creigel Over a year ago

Thank you so much! Worked like a charm!

stopkillinggames.com · Accepted Answer · 2017-06-16 17:33:09Z

1

Regex in order from least sophisticated to more sophisticated, but all of them get the job done:

<a.*>.*\d.*</a>

<a.*>.*\d{5}.*</a>

<a.*id=\d{5}.*>.*\d{5}.*</a>

Disclaimer: be careful. I can't parse html with regex.

edited Jun 16, 2017 at 17:33

answered Jun 16, 2017 at 17:17

stopkillinggames.com

1,4431 gold badge17 silver badges35 bronze badges

1 Comment

creigel Over a year ago

This worked fantastic for the single line. Thank you for the response.

Collectives™ on Stack Overflow

Select and replace multiple lines in Notepad++ using regex

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related