0

How can I parse HTML tags using c++?

eg:

<html><body>examlpe text </body></html>
1
  • If this is an exercise, you're better off by starting with XHTML input. It's similar enough for learning purposes, but much better structured. E.g. there are no unmatched <br> tags in XHTML. Commented Aug 31, 2010 at 7:25

1 Answer 1

6

The easiest option would be to use an HTML parsing library. libxml2 is a solid open-source one, although it's technically a C library. You'd need to load your html and then walk through the DOM pulling out all the text() nodes. I don't know that I'd recommend this as your first C++ task.

Sign up to request clarification or add additional context in comments.

1 Comment

They have a tutorial at xmlsoft.org/tutorial/ar01s05.html. You could also just count the <> characters and extract everything that's not inside a tag. If this is a homework problem, that's probably the solution they are looking for. I'm not going to write it for you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.