notepad++ - delete attributes in HTML start tags with regex

Question

Solution:

Find: <([a-z]+) .?=".?( */?>)

Replace with: <\1$2

I usually copy tables from forum sites to blog sites.

I want no attribute in all start tags.
The tables are like this:

1|<table unwanted_attribute_1>
2|<tbody unwanted_attribute_2>
3|<tr unwanted_attribute_3><td unwanted_attribute_4><br unwanted_attribute_5 /></td></tr>
4|<tr unwanted_attribute_3><td unwanted_attribute_4><span unwanted_attribute_6></span></td></tr>
5|</tbody>
6|</table>
Attributes like "cellspacing", "class", "style", "href" and "target".

I found two answers but they do not seem to be helpful.
[A1]: It uses a fixed condition to find and replace specific terms. But in my situation, start tags are everywhere and vary with the article.
[A2]: I tried this answer but it is not working as follows.

I find <([a-z]+) .*=".*"> and replace with <\1>.
Line 1 and 2 works but line 3 and 4 messed up.

How should I use regex?

EDIT:

<table cellspacing="0" class="t_table" style="background-color: #f8f8f8; border-collapse: collapse; border: 1px solid rgb(227, 237, 245); color: #444444; empty-cells: show; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 16px; line-height: 24px; table-layout: auto; width: 673px; word-wrap: break-word;">
<tbody style="word-wrap: break-word;">
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆<a class="relatedlink" href="◆◆◆" style="border-bottom: 1px solid blue; color: #639805; word-wrap: break-word;" target="_blank">◆◆</a>◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆<br style="word-wrap: break-word;" />◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
</tbody></table>

chris85 · Accepted Answer · 2016-09-16 15:21:47Z

0

Your .* is greedy so it matches everything until the last "> on your line. Here's what your first regex does:

https://regex101.com/r/qK5uY3/1

Try:

<([a-z]+) .*?=".*? *\/?>

I'd recommend looking at plugins for notepad++. There can be many issues using a regex to parse HTML.

https://regex101.com/r/qK5uY3/2

The *\/? before the closing > is matching optional whitespace and a self closing element. The \h I prefer to use but I don't know if Notepad++ supports that (I'm mac'er).

Update:

To capture the closing bit of the self closing element group the full closing part.

<([a-z]+) .*?=".*?( *\/?>)

then replace with the 2nd captured group.

<\1$2

Demo: https://regex101.com/r/qK5uY3/3

edited Sep 16, 2016 at 15:21

answered Sep 16, 2016 at 15:02

chris85

23.9k7 gold badges36 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Louis55 Over a year ago

Thanks for the solution. .*? *?> is the key point to distinguish > in a line. \/ is really optional. But how can I retain <br />?

chris85 Over a year ago

Oh, you need to retain the self-closing? I think if you capture that it'd work. regex101.com/r/qK5uY3/3

Collectives™ on Stack Overflow

notepad++ - delete attributes in HTML start tags with regex

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related