Remove style from HTML Tags using Regex C#

Question

I want to remove style from HTML Tags using C#. It should return only HTML Simple Tags.

For i.e. if String = Hello Then it should return String = Hello

Like that for all HTML Tags, </string>,  etc. etc.

Please help me for this.

Probably because this question has been asked a million times. — Inspector Squirrel
– Inspector Squirrel, Commented Aug 14, 2014 at 12:11

Community · Accepted Answer · 2017-05-23 12:17:39Z

10

First, as others suggest, an approach using a proper HTML parser is much better. Either use HtmlAgilityPack or CsQuery.

If you really want a regex solution, here it is:

Replace this pattern: (<.+?)\s+style\s*=\s*(["']).*?\2(.*?>)
With: $1$3

Demo: http://regex101.com/r/qJ1vM1/1

To remove multiple attributes, since you're using .NET, this should work:

Replace (?<=<[^<>]+)\s+(?:style|class)\s*=\s*(["']).*?\1
With an empty string

edited May 23, 2017 at 12:17

CommunityBot

11 silver badge

answered Aug 14, 2014 at 11:22

Lucas Trzesniewski

51.6k11 gold badges115 silver badges169 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

CSAT Over a year ago

Regex is working fine. But in code, It is showing error unrecognized escape sequence because of " in string. What should i do ??? I am using it as @"(<.+?)\s+style\s*=\s*(["']).*?\2(.*?>)", "")

L.B Over a year ago

@CSAT Read about c# strings. msdn.microsoft.com/en-us/library/aa691090(v=vs.71).aspx

CSAT Over a year ago

Please note that it is not working for other tags like '<ul style="list-style-type:circle;"> Endoderm' What to do for this ?

CSAT Over a year ago

And I also want to remove class so will i make to another regex as same like style ?

Lucas Trzesniewski Over a year ago

@CSAT It's working for me, so please show how you used it so I can tell you what's wrong. If you also want to remove class, see my edit.

|

Noctis · Accepted Answer · 2014-08-14 11:41:51Z

0

As others said, You can use HTML Agility pack, which has this nice tool: HTML Agility Pack test which shows you what you're doing.

Other than that, it's regex, which is not recommended with HTML usually, or simply running on your code with a loop on all chars. If it starts with < read until whitespace, and then remove all the chars up until >. That should take care of most basic cases, but you'll have to test it.

Here's a little snippet that will do it:

void Main()
{
    // your input
    String input = @"<p style=""margin: 15px 0px; padding: 0px; border: 0px; outline: 0px;"">Hello</p>";
    // temp variables
    StringBuilder sb = new StringBuilder();
    bool inside = false;
    bool delete = false;
    // analyze string
    for (int i = 0; i < input.Length; i++)
    {
        // Special case, start bracket
        if (input[i].Equals('<')) { 
            inside = true;
            delete = false;
        }
        // special case, close bracket
        else if (input[i].Equals('>')) {
            inside = false;
            delete = false;
        }
        // other letters
        else if (inside) {
            // Once you have a space, ignore the rest until closing bracket
            if (input[i].Equals(' '))
                delete = true;
        }   
        // add if needed
        if (!delete)
                sb.Append(input[i]);
    }
    var result = sb.ToString(); // -> holds: "<p>Hello</p>"
}

edited Aug 14, 2014 at 11:41

answered Aug 14, 2014 at 11:26

Noctis

11.8k3 gold badges46 silver badges86 bronze badges

3 Comments

Furkan Gözükara Over a year ago

this fails if like this

Furkan Gözükara Over a year ago

this also fails if there are <> inside e.g. <math>K_B \cap\left \{ |k| < \beta\right \} </math>

Noctis Over a year ago

@MonsterMMORPG yep. it will .

ZooZ · Accepted Answer · 2016-05-15 08:28:38Z

0

I usually use the below code to remove inline styles, class, images and comments from an Outlook message prior to saving it into database:

    desc = Regex.Replace(desc, "(<style.+?</style>)|(<script.+?</script>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
    desc = Regex.Replace(desc, "(<img.+?>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
    desc = Regex.Replace(desc, "(<o:.+?</o:.+?>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
    desc = Regex.Replace(desc, "<!--.+?-->", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
    desc = Regex.Replace(desc, "class=.+?>", ">", RegexOptions.IgnoreCase | RegexOptions.Singleline);
    desc = Regex.Replace(desc, "class=.+?\s", " ", RegexOptions.IgnoreCase | RegexOptions.Singleline);

answered May 15, 2016 at 8:28

ZooZ

9911 gold badge18 silver badges28 bronze badges

2 Comments

pistol-pete Over a year ago

Your regex pattern of class=.+?> removes everything between class= and the next > which is more than what you want. class=.+?\" is probably what you were after.

Gray Programmerz Over a year ago

he should use class=".+?" or class='.+?' instead of class=.+?>

Eyad · Accepted Answer · 2020-09-14 23:22:03Z

0

All the answers are fine but it can also be done simply by using this method: "Your HTML String".replace("style", "data-tags"); You can also replace "class" the same way.

answered Sep 14, 2020 at 23:22

Eyad

2111 gold badge3 silver badges5 bronze badges

Comments

Ashish Srivastava · Accepted Answer · 2017-05-31 07:28:07Z

-1

   source = Regex.Replace(source, "(<style.+?</style>)|(<script.+?</script>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
   source = Regex.Replace(source, "(<img.+?>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
   source = Regex.Replace(source, "(<o:.+?</o:.+?>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
   source = Regex.Replace(source, "<!--.+?-->", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
   source = Regex.Replace(source, "class=.+?>", ">", RegexOptions.IgnoreCase | RegexOptions.Singleline);
   source = Regex.Replace(source.Replace(System.Environment.NewLine, "<br/>"), "<[^(a|img|b|i|u|ul|ol|li)][^>]*>", " ");

answered May 31, 2017 at 7:28

Ashish Srivastava

161 bronze badge

1 Comment

RBT Over a year ago

May I request you to please add some context around your source-code. Code-only answers are difficult to understand. It will help the asker and future readers both if you can add more information in your post.

Collectives™ on Stack Overflow

Remove style from HTML Tags using Regex C#

5 Answers 5

6 Comments

3 Comments

2 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

6 Comments

3 Comments

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related