1

Using regular expressions, how do I remove style tags, CSS, scripts and HTML tags from HTML to plain text.

In ASP.NET C#.

3
  • Accept your recent questions. Commented Mar 8, 2011 at 9:56
  • @vasmay, when you got a reasonable answer for your question then click on tick mark to accept the answer. Commented Mar 8, 2011 at 10:15
  • @vasmay, do you want to remove these from a .html file(s)? Commented Mar 8, 2011 at 16:31

1 Answer 1

1

I don't think you are looking for a regex to do this, however the following regex should do it, if you run a regex replace:

<[^>]*>

To use this in a Regex Replace to the following:

string myHtmlString = "<html><body>my test text</body></html>";

string myPlainTextString = Regex.Replace(myHtmlString ,"<[^>]*>",String.Empty);

I recommend you use something like the Html Agility pack though - http://htmlagilitypack.codeplex.com/

as it has a method to make this even easier called "ConvertToPlainText":

string myHtmlString = "<html><body>my test text</body></html>";

string myPlainTextString = ConvertToPlainText(myHtmlString);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.