Parsing HTML page in ASP.NET

Question

I'm trying to parse HTML on an external page and read its contents (eg. get "title" element from google.com). XmlDataSource does not appear to be working because it's not clean XML, does anybody know how to do this?

Thank you.

carla · Accepted Answer · 2017-11-27 06:53:19Z

5

You should use Html Agility Pack.

edited Nov 27, 2017 at 6:53

carla

2,1471 gold badge34 silver badges48 bronze badges

answered Mar 15, 2011 at 4:01

KV Prajapati

94.8k20 gold badges151 silver badges188 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

nycdan · Accepted Answer · 2011-03-15 04:22:57Z

If it's something simple, can you just do some basic string parsing? It's not the most efficient but if works well enough.

First get your html (in case this is part of what you needed):

WebClient client = new WebClient();
string webhtml = client.DownloadString(strURL);

If you have a repeating pattern, you can then use .Split to divide it up.

Now just use .IndexOf (or .LastIndexOf) and .Substring to parse as needed. If you need to do this a lot, or iteratively, you can create a function where you pass the html and the start and end delimiters - plus a few other parameters as needed. You'll need to offset the start delimiter by adding the length of the string to the index but otherwise it's fairly straightforward.

keithwill · Accepted Answer · 2011-03-15 04:08:48Z

0

Use Sgml Reader (http://sourceforge.net/projects/dekiwiki/files/SgmlReader/) if you are interested in treating HTML like XML for parsing. While this may be overkill for getting the title, it will be faster than other similar methods when parsing large HTML pages.

answered Mar 15, 2011 at 4:08

keithwill

2,0441 gold badge18 silver badges26 bronze badges

Collectives™ on Stack Overflow

Parsing HTML page in ASP.NET

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related