Find/Replace regex to remove html tags

Question

Using find and replace, what regex would remove the tags surrounding something like this:

<option value="863">Viticulture and Enology</option>

Note: the option value changes to different numbers, but using a regular expression to remove numbers is acceptable

I am still trying to learn but I can't get it to work.

I'm not using it to parse HTML, I have data from one of our company websites that we need in excel, but our designer deleted the original data file and we need it back. I have a list of the options and need to remove the HTML tags, using Notepad++ to find and replace

Toto · Accepted Answer · 2018-11-29 16:44:42Z

20

This works for me Notepad++ 5.8.6 (UNICODE)

search : <option value="\d+">(.*?)</option>

replace : $1

Be sure to select "Regular expression" and ". matches newline" enter image description here

edited Nov 29, 2018 at 16:44

answered Apr 27, 2011 at 17:33

Toto

91.7k63 gold badges97 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sunil Kumar · Accepted Answer · 2016-09-10 05:32:39Z

13

I have done by using following regular expression:

Find this : <.*?>|</.*?>

and

replace with : \r\n (this for new line)

By using this regular expression (<.*?>|</.*?>) we can easily find value between your HTML tags like below:

I have input:

<otpion value="123">1</option><otpion value="1234">2</option><otpion value="1235">3</option><otpion value="1236">4</option><otpion value="1237">5</option>

I need to find values between options like 1,2,3,4,5

and got below output :

answered Sep 10, 2016 at 5:32

Sunil Kumar

3,2522 gold badges23 silver badges34 bronze badges

Comments

dubblebee · Accepted Answer · 2014-07-01 16:31:28Z

8

This works perfectly for me:

Select "Regular Expression" in "Find" Mode.
Enter [<].*?> in "Find What" field and leave the "Replace With" field empty.
Note that you need to have version 5.9 of Notepad++ for the ? operator to work.

as found here: digoCOdigo - strip html tags in notepad++

answered Jul 1, 2014 at 16:31

dubblebee

932 gold badges3 silver badges9 bronze badges

Comments

Justin Morgan · Accepted Answer · 2011-04-27 17:18:41Z

2

Something like this would work (as long as you know the format of the HTML won't change):

<option value="(\d+)">(.+)</option>

answered Apr 27, 2011 at 17:18

Justin Morgan

2,4452 gold badges16 silver badges19 bronze badges

3 Comments

stewart715 Over a year ago

Hm, this erased the entire line, but looks close.

stewart715 Over a year ago

I will do two find and replaces: one for <option value="(\d+)"> and then one for </option>. Works beautifully thank you.

Justin Morgan Over a year ago

If you're using Notepad++ find/replace, it's not going to work because the regex uses backreferences to capture the fields you want to keep. For find/replace, just replace everything before the numbers with nothing, then replace "> with a delimeter (like | but not commas, since there may be commas in the name), then finall replace the </option> with nothing. Import the result into Excel.

user unknown · Accepted Answer · 2011-04-27 17:35:22Z

1

String s = "<option value=\"863\">Viticulture and Enology</option>";
s.replaceAll ("(<option value=\"[0-9]+\">)([^<]+)</option>", "$2")
res1: java.lang.String = Viticulture and Enology

(Tested with scala, therefore the res1:)

With sed, you would use a little different syntax:

echo '<option value="863">Viticulture and Enology</option>'|sed -re 's|(<option value="[0-9]+">)([^<]+)</option>|\2|'

For notepad++, I don't know the details, but "[0-9]+" should mean 'at least one digit', "[^<]" anything but a opening less-than, multiple times. Masking and backreferences may differ. Regexes are problematic, if they span multiple lines, or are hidden by a comment, a regex will not recognize it.

However, a lot of html is genereated in a regex-friendly way, always fitting into a line, and never commented out. Or you use it in throwaway code, and can check your input before.

edited Apr 27, 2011 at 17:35

answered Apr 27, 2011 at 17:24

user unknown

36.4k12 gold badges77 silver badges123 bronze badges

1 Comment

stewart715 Over a year ago

this is really helpful, just gonna loop through them all now :D TY!

Collectives™ on Stack Overflow

Find/Replace regex to remove html tags

5 Answers 5

Comments

Comments

Comments

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related