1

Solution:

Find: <([a-z]+) .?=".?( */?>)

Replace with: <\1$2


I usually copy tables from forum sites to blog sites.

I want no attribute in all start tags.
The tables are like this:

1|<table unwanted_attribute_1>
2|<tbody unwanted_attribute_2>
3|<tr unwanted_attribute_3><td unwanted_attribute_4><br unwanted_attribute_5 /></td></tr>
4|<tr unwanted_attribute_3><td unwanted_attribute_4><span unwanted_attribute_6></span></td></tr>
5|</tbody>
6|</table>
Attributes like "cellspacing", "class", "style", "href" and "target".

I found two answers but they do not seem to be helpful.
[A1]: It uses a fixed condition to find and replace specific terms. But in my situation, start tags are everywhere and vary with the article.
[A2]: I tried this answer but it is not working as follows.

I find <([a-z]+) .*=".*"> and replace with <\1>.
Line 1 and 2 works but line 3 and 4 messed up.

How should I use regex?

EDIT:

<table cellspacing="0" class="t_table" style="background-color: #f8f8f8; border-collapse: collapse; border: 1px solid rgb(227, 237, 245); color: #444444; empty-cells: show; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 16px; line-height: 24px; table-layout: auto; width: 673px; word-wrap: break-word;">
<tbody style="word-wrap: break-word;">
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆<a class="relatedlink" href="◆◆◆" style="border-bottom: 1px solid blue; color: #639805; word-wrap: break-word;" target="_blank">◆◆</a>◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆<br style="word-wrap: break-word;" />◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">= ◆◆◆◆ =<br style="word-wrap: break-word;" /></td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
<tr style="word-wrap: break-word;"><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆◆◆</td><td style="border: 1px solid rgb(227, 237, 245); overflow: hidden; padding: 4px; word-wrap: break-word;">◆◆◆◆</td></tr>
</tbody></table>
0

1 Answer 1

0

Your .* is greedy so it matches everything until the last "> on your line. Here's what your first regex does:

https://regex101.com/r/qK5uY3/1

Try:

<([a-z]+) .*?=".*? *\/?>

I'd recommend looking at plugins for notepad++. There can be many issues using a regex to parse HTML.

https://regex101.com/r/qK5uY3/2

The *\/? before the closing > is matching optional whitespace and a self closing element. The \h I prefer to use but I don't know if Notepad++ supports that (I'm mac'er).

Update:

To capture the closing bit of the self closing element group the full closing part.

<([a-z]+) .*?=".*?( *\/?>)

then replace with the 2nd captured group.

<\1$2

Demo: https://regex101.com/r/qK5uY3/3

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the solution. .*? *?> is the key point to distinguish > in a line. \/ is really optional. But how can I retain <br />?
Oh, you need to retain the self-closing? I think if you capture that it'd work. regex101.com/r/qK5uY3/3

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.