0

I know that counting the number of tags in a document can be done with something like the following

var tableCount = $('body table tr').length;

Now I presume that this only counts the number of tags. What I want to know is that I have the same number of closing tags . So if the code above shows there are 72 tags, I now want something to tell me that there are 72 closing tr tags.

Is this possible?

Thanks

7
  • To what end, if you don't mind my asking? Commented Jun 9, 2015 at 10:25
  • You could read the innerHTML of the table and then use regex to count both start and end tr tags. But if HTML structure is not proper (i.e. missing end tags), nobody knows what can happen. Commented Jun 9, 2015 at 10:26
  • closing tags ? if you write valid code your .length is true if you match tr as pair tag Commented Jun 9, 2015 at 10:28
  • Almost any browser usually auto-inserts missing closing tags when creating the DOM. As far as I know javascript has only access to the DOM and not the original sourcecode. So you can't check for missing closing tags using javascript. But you could take a look at https://validator.w3.org/ and paste your HTML there in order to check for validity, if thats what you're trying to do. Commented Jun 9, 2015 at 10:31
  • I suppose you want to do that to fix missing closing tags with some document using jQuery, don't do that, just fix it manually. Commented Jun 9, 2015 at 10:31

2 Answers 2

2

Ideally, you would use a function like this:

function checkTable(tableElement) {

  // Get inner HTML
  var html = tableElement.innerHTML;

  // Count <tr>
  var count1 = html.match(/<tr/g).length;

  // Count </tr>
  var count2 = html.match(/<\/tr/g).length;

  // Equals?
  return count1 === count2;

}

However, due to browser's mumbo-jumbo, the mismatched tags get auto-corrected (i.e. auto-closed). Therefore it is impossible for a running page to validate itself. Here is a proof of concept: JS Bin.

Explanation: The second table has a typo (opening tag instead of a closing tag), but the function returns true in both cases. If one inspects the generated HTML (the one that is accessible through DOM), one can see that the browser auto-corrected the mismatched tags (there is an additional empty table row).


Luckily, there is another way. To obtain the pure (i.e. not modified by the browser) HTML code, you can make an AJAX request to the current page URL. Yes, you read correctly - the page loads itself again. But no worries, there is no recursion and possible stackoverflow here, since you do not process the fetched page.

The JS code for the following is:

var selfUrl = document.location.href;

function checkHTML(html) {

  // Count <tr>
  var count1 = html.match(/<tr/g).length;
  console.log(count1);

  // Count </tr>
  var count2 = html.match(/<\/tr/g).length; // </tr (do not remove this comment!)
  console.log(count2);

  // Equals?
  return count1 === count2;

}

$.get(selfUrl, function(html) {
  console.log(checkHTML(html));
});

But beware of one pitfall. If you include this code in the HTML itself (usually discouraged), then you must not remove that one comment. The reason is the following: one regex contains <tr, while the other has the forward slash escaped and does therefore not contain a </tr. And since you fetch the whole HTML code (including the JS code), the count is mismatched. To even this, I have added an additional </tr inside a comment.

Sign up to request clarification or add additional context in comments.

Comments

2

Your question reminds me the idea of the SAX Parser, as the HTML code obviously is the kind of XML. SAX Parser is commonly looking at the start and end tags, as long as element attributes and content.

Some time ago, I have used the simple SAX Parser library from: http://ejohn.org/blog/pure-javascript-html-parser/ Available at: http://ejohn.org/files/htmlparser.js

Using this library you can do the following:

$(document).ready(function(){
    var htmlString = $('#myTable').html(),
        countStart = 0,
        countEnd = 0;

    HTMLParser(htmlString, {
        start: function(tag, attrs, unary) {
            countStart += 1; // you may add the if tag === 'tr' or else
            console.log("start: " + tag);
        },
        end: function(tag) {
            countEnd += 1; // you may add the if tag === 'tr' or else
            console.log("end: " + tag);
        },
        chars: function(text) {},
        comment: function(text) {}
    });
});

There are also modern Node-based approaches like: https://github.com/isaacs/sax-js/blob/master/examples/example.js which can be used for the same task.

4 Comments

The parser is OK, but this won't work on a live page, since the HTML you see via .html() is not the same as the HTML found in the document source. See my answer for more details. (I have also provided a proof of concept for this scenario)
Thanks @alesc, but in my answer, I do not consider all the issues that may occur, as well as the library that I used maybe do not cover all the possible tag issues with its parsing. Maybe if we connect both our ideas, it will be the perfect answer :)
Your code is OK and the library is also OK. The problem is that the browser auto-closes mismatched tags. Therefore when you call .html() you do not get the real HTML that can be found in the source code. That's why I went around this by making a AJAX request on the page URL.
No problem, I vote up your answer as it it a good reminder. In my case, I clearly assumed that we had the well-formed html code as a string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.