0

I am trying to do this:

  1. Read html document "myDocument.html" with Node
  2. Insert contents of another html document named "foo.html" immediately after the open body tag of myDocument.html.
  3. Insert contents of yet another html document named "bar.html" immediately before the close body tag of myDocument.html.
  4. Save the modified version of "myDocument.html".

To do the above, I would need to search the DOM with Node to find the open and closing body tags. How can this be done?

0

3 Answers 3

1

Very simply, you can use the native Filesystem module that comes with Node.JS. (var fs = require("fs")). This allows you to read and convert the HTML to a string, perform string replace functions, and finally save the file again by rewriting it.

The advantage is that this solution is completely native, and requires no external libraries. It is also completely faithful to the original HTML file.

//Starts reading the file and converts to string.
fs.readFile('myDocument.html', function (err, myDocData) {
      fs.readFile('foo.html', function (err, fooData) { //reads foo file
          myDocData.replace(/\<body\>/, "<body>" + fooData); //adds foo file to HTML
          fs.readFile('bar.html', function (err, barData) { //reads bar file
              myDocData.replace(/\<\/body\>/, barData + "</body>"); //adds bar file to HTML
              fs.writeFile('myDocumentNew.html', myDocData, function (err) {}); //writes new file.
          });
      });
});
Sign up to request clarification or add additional context in comments.

Comments

0

In a simple but not accurate way, you can do this:

str = str.replace(/(<body.*?>)/i, "$1"+read('foo.html'));

str = str.replace(/(<\/body>)/i, read('bar.html')+'$1');

It will not work if the myDocument content contains multiple "<body ..' or '</body>', e.g. in javascript, and also the foo.html and bar.html can not contains '$1' or '$2'...

If you can edit the content of myDocument, then you can leave some "placeholder" there(as html comments), like

<!--foo.html-->

Then, it's easy, just replace this "placeholder" .

1 Comment

0

Use the cheerio library, which has a simplified jQuery-ish API.

var cheerio = require('cheerio');
var dom = cheerio(myDocumentHTMLString);
dom('body').prepend(fooHTMLString);
dom('body').append(barHTMLString);
var finalHTML = dom.html();

And just to be clear since the legions of pro-regex individuals are already appearing in droves, yes you need a real parser. No you cannot use a regular expression. Read Stackoverflow lead developer Jeff Atwood's post on parsing HTML the Cthulhu way.

8 Comments

Is the extra library really necessary though?
Darn, you have the Jeff Atwood on your side there. However, seeing as only the body tag needs to be identified, I don't think a whole parsing library would be necessary as such. Nevertheless, you may want to mention the filesystem functions, as the OP specifically mentions reading and saving the file; not just modifying the dom.
Yes but stackoverflow requires "thoroughly researched" questions and attempted code snippets. Reading files in node is clearly documented and the web is full of examples. OP has >2K rep. I think the crux of his question has to do with the HTML modification not stuff you learn in your first 90 seconds of a node.js tutorial.
Jeff doesn't appear to be on anyone's side in that post: It's considered good form to demand that regular expressions be considered verboten, totally off limits for processing HTML, but I think that's just as wrongheaded as demanding **every trivial HTML processing task be handled by a full-blown parsing engine**. It's more important to understand the tools, and their strengths and weaknesses, than it is to knuckle under to knee-jerk dogmatism.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.