0

I am wondering if you could please help with generating .cpp/.h file from the following html file in a programmatic way (using whatever scripting language, or programming language, or even using editors such as vi or emacs):

<!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>Class</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body link="blue" vlink="purple" bgcolor="#FFFABB" text="black">

<h2><font face="Helvetica">Code Fragment: Class</font></h2>
</center><br><dl><dd><pre>

  <font color=#A000A0>template</font> &lt;<font color=#A000A0>typename</font> G&gt;
  <font color=#A000A0>class</font> Components : <font color=#A000A0>public</font> DFS&lt;G&gt; {            <font color=#0000FF>// count components</font>
  <font color=#A000A0>private</font>:
    <font color=#A000A0>int</font> nComponents;                 <font color=#0000FF>// num of components</font>
  <font color=#A000A0>public</font>:
    <font color=#000000>Components</font>(<font color=#A000A0>const</font> G& g): DFS&lt;G&gt;(g) {}        <font color=#0000FF>// constructor</font>
    <font color=#A000A0>int</font> <font color=#A000A0>operator</font>()();                 <font color=#0000FF>// count components</font>
  };
</dl>

</body>
</html>

If you could please point out how this was done in the other direction too, that would be great. Thanks a lot.

4
  • You want a tool to copy the highlighted text in an HTML page? Commented Sep 21, 2011 at 22:55
  • 1
    What should the generated .cpp and/or .h file do? Commented Sep 21, 2011 at 22:57
  • @Keith: not sure why you asked that. I just want to be able to switch between this kind of html representation of my c++ code and vice versa. I am asking the programmatic way or any tools that I can use to do that quickly in batch mode. Commented Sep 21, 2011 at 23:12
  • @Qiang: Oh, I see what you mean. I didn't see past the HTML tags to notice that the HTML is a representation of C++ code, so I didn't think the idea of translating HTML to C++ made much sense. Nevcer mind. Commented Sep 22, 2011 at 2:44

6 Answers 6

8

Does this work for you?

[18:56:44 jaidev@~]$ lynx --dump foo.html
Code Fragment: Class


  template <typename G>
  class Components : public DFS<G> {            // count components
  private:
    int nComponents;                 // num of components
  public:
    Components(const G& g): DFS<G>(g) {}        // constructor
    int operator()();                 // count components
  };
[18:56:49 jaidev@~]$

Edit:

For the reverse direction. If you use vim as your editor, you can enter :TOhtml to generate a syntax highlighted HTML version of your code in a new buffer. It generates a html based on your vim colorscheme. To change the colorscheme, use the :colorscheme <name> command.

Sign up to request clarification or add additional context in comments.

5 Comments

@Qiang Li: any PHP, Python or JS syntax highlight plugin will do
Edited answer for the reverse direction.
@yi_H, can you please be more specific, for example, what is Python's?
@yi_H: do you mind telling me what formatter did you use to give the nice coloring?
sigh if you can't figure that out you should seek for another job.
2

PHP script:

$doc = new DOMDocument();
$doc->loadHTMLFile("file.html");
$xpath = new DOMXpath($doc);
$str = '';
foreach ($xpath->query("//dl//text()") as $node) {
    $str .= $node->nodeValue . ' ';
}

file_put_contents('file.cpp', $str);

contents of file.cpp:

   template  < typename  G>
   class  Components :  public  DFS<G> {             // count components 
   private :
     int  nComponents;                  // num of components 
   public :
     Components ( const  G& g): DFS<G>(g) {}         // constructor 
     int   operator ()();                  // count components 
  };

Comments

1

You could use regular expressions to...

  • ...keep only what's in the <body> of the HTML page,
  • ...strip all the HTML tags (everything that looks like <.*> should be removed from the file).
  • ...unescape special characters such as &lt;, &gt;, &amp; etc.

What's left should be the code you're looking for.

Comments

1

Another option for going from HTML to the source code is the html2text utility, that is often found installed in many Linux distributions.

matteo@teomint:~/Desktop$ html2text out.html 
***** Code Fragment: Class *****


        template <typename G>
        class Components : public DFS<G> {            // count components
        private:
          int nComponents;                 // num of components
        public:
          Components(const G& g): DFS<G>(g) {}        // constructor
          int operator()();                 // count components
        };

Comments

0
  • Fix the HTML. You're missing some closing tags.
  • Get PHP out
    • Obtain the pre code block with DOMDocument
    • strip_tags() from the result
  • Profit.

Comments

0

If you're trying to strip all HTML tags to get back the original, non-highlighted source code, then you have a two options that I can think of:

  1. Parse the DOM tree and just grab all relevant text.
  2. Use some regular expressions to remove the tags themselves. For example, maybe "s///" would be a good start?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.