0

Are there any C++ libraries available to read HTML in Linux?

1
  • 1
    Read or parse? HTML is just plain text, if you want only want to read it you can treat it like any other file. Also, do you mean an HTML response from a web server or a local file? Commented Oct 26, 2010 at 16:01

2 Answers 2

2

libcurl is your friend + tidy (HTML tidy) if you've got broken HTML to fix.

Edit: Here is the full sequence

HTML (in file) -> tidy (which will clean up the malformed HTML) -> XSLT transformation (you'll need to provide an XSL file to translate your HTML to latex), and use libxml/libxsl (http://xmlsoft.org/) -> latex document is then processed using latex (by forking out to latex the command) or if you want, you could download the source code for lyx and see how they do it (http://www.lyx.org/). Unfortunately the sequence is too complex to write into a single example, all I can give you is the sequence...

Sign up to request clarification or add additional context in comments.

8 Comments

can please share sample program how it used in c++
first, what are you trying to do? e.g. read from a URL? read from a file? Do you need to treat it as a DOM, or are you looking for something specific - without that information, it's a stab in the dark...
i am just reading that from file after that i want to convert in to post script any idea on c++ libraries which do this task
if you already have the HTML in a file, you could try "html2ps", I think you can get this pre-installed on *nixes or there is a perl script available via google. If you want a nice library to do this, not sure there exists one (Apache FOP is one option, however you'd have to somehow delegate the formatting operations to java from your C++)
i should not use that i need write c++ program to convert that into post script
|
0

Have a look at the following:

Also there was a similar question asked already.

3 Comments

hay i need it for linux not for windows and there should'nt be any scripting
Both htmlcxx and wcHTML should be available for Linux.
I need seperate files compiled for html parser so that i can directly add them to my C++ project and use it

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.