C# HTMLAgilityPack HTML to Text - Parse Errors

Question

I need to extract text from an HTML file using C#. I am trying to use HTMLAgilityPack but I am seeing some parse errors (tags not closed). I am using these two options:

        htmlDoc.OptionFixNestedTags = true;
        htmlDoc.OptionAutoCloseOnEnd = true;

Is there any "Fix all" type option. I don't care about the errors, I just want the content or close.

Ichibann · Accepted Answer · 2010-09-27 09:42:21Z

4

Maybe this is workaround but once I had to extract text from HTML I used regex:

result = Regex.Replace(result, @"<(.|\n)*?>", String.Empty);
result = Regex.Replace(result, @"^\n*", String.Empty, RegexOptions.Singleline | RegexOptions.IgnoreCase);
result = Regex.Replace(result, @"\n*$", String.Empty, RegexOptions.Singleline | RegexOptions.IgnoreCase);
result = result.Replace("\n", " ");

answered Sep 27, 2010 at 9:42

Ichibann

4,4719 gold badges35 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

C# HTMLAgilityPack HTML to Text - Parse Errors

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related