Check if a string is valid HTML using JavaScript

Discover effective way to validate HTML strings in JavaScript. Ensure correctness and efficiency with this comprehensive guide.

There isn’t a single definitive way to determine if a string is valid HTML, as HTML itself is flexible and can be malformed. However, we can use various methods to check for the presence of HTML-like structures in a string.

One method is to use DOMParser API and its method parseFromString.

The DOMParser API interface allows you to parse XML or HTML source code from a string and convert it into a DOM Document. It is used to convert a string of XML or HTML into a structured DOM object that can be easily manipulated using JavaScript.

Key considerations

HTML validity is subjective and depends on the context.
Simple string matching may not catch all valid HTML structures.
Parsing the entire HTML structure requires more complex methods.
Browser-based parsing can load external resources.

Web browsers often tolerate and even fix certain types of malformed HTML. This means that:

Some invalid HTML may still render correctly across browsers.
Different browsers might interpret the same invalid HTML differently.

This tolerance creates a gray area between what’s technically valid and what works in practice.

Essential requirements

The parseFromString method requires two arguments: string and mimeType.

The argument string must contain either an HTML, xml, XHTML, or svg document. The argument mimeType determines whether the XML parser or the HTML parser is used to parse the string.

Valid mime type values are:

text/html
text/xml
application/xml
application/xhtml+xml
image/svg+xml

How does the DOMParser interface parse HTML strings differently based on the mimeType argument?

The DOMParser interface parses HTML strings differently based on the mimeType argument. The mimeType argument determines whether the XML parser or the HTML parser is used to parse the string. The difference in parsing is that the XML parser is more strict and will return a parser error for invalid HTML, while the HTML parser is more lenient and will try to interpret the string as HTML even if it contains errors.

Practical example

See the check if a string is valid HTML using JavaScript example. Enter some HTML into the textarea and activate the submit button to determine if the provided string is valid HTML.

Notice that different mime types give different results in the validation.

Code

Here are two version of the code: TypeScript and JavaScript. We also need to catch the errors.

When using the XML parser with a string that doesn’t represent well-formed XML, the XMLDocument returned by parseFromString will contain a <parsererror> node describing the nature of the parsing error.

Additionally, the parsing error may be reported to the browser’s JavaScript console, and you should not use this for any kind of validation, sanitation, or XSS checks.

The function isStringValidHtml returns an object with the following properties:

isParseErrorAvailable – a boolean that determines if <parsererror> element is available. true indicates that for a given mime type, the string is valid.
isStringValidHtml – a boolean that determines if a given string is valid HTML.
parsedDocument – it contains the <parsererror> content or document when <parsererror> is not available.

Check if the string is a valid HTML, TypeScript version.

public static isStringValidHtml(html: string, mimeType: string = 'application/xml'): { [key: string]: any } {
  const domParser: DOMParser = new DOMParser();
  const doc: Document = domParser.parseFromString(html, mimeType);
  const parseError: Element | null = doc.documentElement.querySelector('parsererror');
  const result: { [key: string]: any } = {
    isParseErrorAvailable: parseError !== null,
    isStringValidHtml: false,
    parsedDocument: ''
  };

  if (parseError !== null && parseError.nodeType === Node.ELEMENT_NODE) {
    result.parsedDocument = parseError.outerHTML;
  } else {
    result.isStringValidHtml = true;
    result.parsedDocument = typeof doc.documentElement.textContent === 'string' ? doc.documentElement.textContent : '';
  }

  return result;
}

Check if the string is a valid HTM, JavaScript version.

function isStringValidHtml(html, mimeType) {
const domParser = new DOMParser();
  const doc = domParser.parseFromString(html, typeof mimeType == 'string' ? mimeType : 'application/xml');
  const parseError = doc.documentElement.querySelector('parsererror');
  const result = {
    isParseErrorAvailable: parseError !== null,
    isStringValidHtml: false,
    parsedDocument: ''
  };

  if (parseError !== null && parseError.nodeType === Node.ELEMENT_NODE) {
    result.parsedDocument = parseError.outerHTML;
  } else {
    result.isStringValidHtml = true;
    result.parsedDocument = typeof doc.documentElement.textContent === 'string' ? doc.documentElement.textContent : '';
  }

  return result;

Example of validation error

Example of HTML string validation using JavaScript DOMParser: "error on line 1 at column 22: Extra content at the end of the document"

What a MIME type is and why it’s used in the `isStringValidHtml` function?

In the context of the isStringValidHtml function, the MIME type is used to tell the DOMParser object what type of document to expect. When parsing a string, the parser needs to know the format of the string in order to parse it correctly. By specifying the MIME type, we give the parser this information. For HTML strings, the MIME type would typically be text/html or application/xhtml+xml. If the MIME type is not specified, it defaults to application/xml.

How does the `DOMParser` API handle invalid HTML syntax?

The DOMParser API in JavaScript handles invalid HTML syntax by attempting to parse the string and creating a HTMLDocument object. If the string is not well-formed HTML, the resulting HTMLDocument object might contain a <parsererror> node, which describes the nature of the parsing error.

The DOMParser API does not fix or correct the invalid HTML. It merely attempts to parse the string and reports any errors it encounters during parsing.

How can you ensure that HTML string validation in JavaScript correctly detects missing or misplaced tags?

There is no single HTML validator built into the browser, but you can still mechanically verify that every start tag has a matching end tag andsemantically check that the browser did not silently rewrite your markup.

Below is a drop-in, framework-agnostic helper that does both in less than 60 lines of code.

1. Mechanical check – is every tag balanced?

We treat the string as if it were XML just long enough to count open/close pairs. Self-closing tags are ignored and everything else is pushed on a stack.

isVoidElement.js – works in browser and Node.js

const VOID_CACHE = new Map();
/*  Detects whether the current engine treats <tag>
Falls back to the HTML 5 spec list when DOM is not available.
*/
function isVoidElement(tagName) {
  if (VOID_CACHE.has(tagName)) return VOID_CACHE.get(tagName);
  /* ---------- server-side fallback ---------- */
  if (typeof document === "undefined") {
    // HTML 5 void elements
    const voidSet = new Set([
      "area",
      "base",
      "br",
      "col",
      "embed",
      "hr",
      "img",
      "input",
      "link",
      "meta",
      "param",
      "source",
      "track",
      "wbr",
    ]);
    const result = voidSet.has(tagName.toLowerCase());

    VOID_CACHE.set(tagName, result);

    return result;
  }

  /* ---------- Browser environment ---------- */
  const ns = "http://www.w3.org/1999/xhtml";
  try {
    const elem = document.createElementNS
      ? document.createElementNS(ns, tagName)
      : document.createElement(tagName);
    const markup = window.XMLSerializer
      ? new XMLSerializer().serializeToString(elem)
      : elem.outerHTML;

    const isVoid = markup.includes("></") === false;

    VOID_CACHE.set(tagName, isVoid);

    return isVoid;
  } catch {
    // Invalid element name (e.g. capitalised SVG in XHTML)
    VOID_CACHE.set(tagName, false);

    return false;
  }
}

findUnbalanced.js

/**
 * Returns null if every tag is balanced.
 * Otherwise returns { tag, expected, found } describing the mismatch.
 */
export function findUnbalanced(html) {
  const selfClosing = new Set([
    'area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input',
    'link', 'meta', 'param', 'source', 'track', 'wbr'
  ]);
  const re = /<\s*([a-zA-Z][a-zA-Z0-9-]*)(?:\s[^>]*)?\s*>|<\/\s*([a-zA-Z][a-zA-Z0-9-]*)\s*>/g;
  const stack = [];
  let m;
  while ((m = re.exec(html)) !== null) {
    const startTag = m[1] ? m[1].toLowerCase() : null;
    const endTag = m[2] ? m[2].toLowerCase() : null;

    if (startTag) {
      if (!isVoidElement(startTag)) {
        stack.push(startTag);
      }
    } else if (endTag) {
      if (!stack.length || stack.pop() !== endTag) {
        return { tag: endTag, expected: stack.at(-1), found: endTag };
      }
    }

  }
  return stack.length
    ? { tag: stack[stack.length - 1], expected: null, found: stack[stack.length - 1] }
    : null;
}

Usage example:

Quick test

const bad = '<div><p>text</div></p>';
console.log(findUnbalanced(bad));
// → { tag: 'div', expected: 'p', found: 'div' }

2. Semantic check – did the browser mutate the DOM?

Even balanced HTML can be rewritten (tables get tbody, stray meta tags move to <head>, etc.). Parse the string twice and compare the final DOM. If the two serialisations differ, the browser fixed something.

isDOMIntact.js

export function isDOMIntact(html) {
  const dp   = new DOMParser();
  const doc1 = dp.parseFromString(html, 'text/html');
  const doc2 = dp.parseFromString(html, 'text/html');

  return doc1.documentElement.outerHTML === doc2.documentElement.outerHTML;
}

3. One-line validator

Chain the two checks together and you get a tiny utility you can require from either the browser or Node.js (with JSDom).

validateHTML.js

export function validateHTML(html) {
  const unbal = findUnbalanced(html);

  if (unbal) {
    return { valid: false, reason: 'Unbalanced tag', details: unbal };
  }

  if (isDOMIntact(html) === false) {
    return { valid: false, reason: 'Browser auto-corrected the markup' };
  }

  return { valid: true };
}

4. Server-side (Node.js) usage

Node entry point

import { JSDOM } from 'jsdom';
global.DOMParser = new JSDOM().window.DOMParser;

// now import and use validateHTML() exactly as in the browser

Summary

Checking if a string is valid HTML requires balancing simplicity, accuracy, and performance. The method you use will be determined by your individual needs. For example, you could use regular expressions for the fast check, but this may result in false positives. More robust solutions, such as DOM parsing or node type checking, provide more accuracy but may have downsides, such as resource loading or temporary DOM manipulation. Consider the security implications of working with potentially harmful input.

Check if a string is valid HTML using JavaScript

Key considerations

Essential requirements

How does the DOMParser interface parse HTML strings differently based on the mimeType argument?

Practical example

Code

Example of validation error

What a MIME type is and why it’s used in the `isStringValidHtml` function?

How does the `DOMParser` API handle invalid HTML syntax?

How can you ensure that HTML string validation in JavaScript correctly detects missing or misplaced tags?

1. Mechanical check – is every tag balanced?

2. Semantic check – did the browser mutate the DOM?

3. One-line validator

4. Server-side (Node.js) usage

Summary

Share page on

Related posts

Leave a Reply Cancel reply

Search in sitelint.com

Resources

Key considerations

Essential requirements

How does the DOMParser interface parse HTML strings differently based on the mimeType argument?

Practical example

Code

Example of validation error

What a MIME type is and why it’s used in the isStringValidHtml function?

How does the DOMParser API handle invalid HTML syntax?

How can you ensure that HTML string validation in JavaScript correctly detects missing or misplaced tags?

1. Mechanical check – is every tag balanced?

2. Semantic check – did the browser mutate the DOM?

3. One-line validator

4. Server-side (Node.js) usage

Summary

Share page on

Related posts

Comments

Leave a Reply Cancel reply

Search in sitelint.com

Audit and debug pages with browser extension

What a MIME type is and why it’s used in the `isStringValidHtml` function?

How does the `DOMParser` API handle invalid HTML syntax?