
Check if a string is valid HTML using JavaScript
Discover effective way to validate HTML strings in JavaScript. Ensure correctness and efficiency with this comprehensive guide.
There isn’t a single definitive way to determine if a string is valid HTML, as HTML itself is flexible and can be malformed. However, we can use various methods to check for the presence of HTML-like structures in a string.
One method is to use DOMParser API and its method parseFromString.
The DOMParser API interface allows you to parse XML or HTML source code from a string and convert it into a DOM Document. It is used to convert a string of XML or HTML into a structured DOM object that can be easily manipulated using JavaScript.
Key considerations
- HTML validity is subjective and depends on the context.
- Simple string matching may not catch all valid HTML structures.
- Parsing the entire HTML structure requires more complex methods.
- Browser-based parsing can load external resources.
Web browsers often tolerate and even fix certain types of malformed HTML. This means that:
- Some invalid HTML may still render correctly across browsers.
- Different browsers might interpret the same invalid HTML differently.
Essential requirements
The parseFromString method requires two arguments: string and mimeType.
The argument string must contain either an HTML, xml, XHTML, or svg document. The argument mimeType determines whether the XML parser or the HTML parser is used to parse the string.
Valid mime type values are:
text/htmltext/xmlapplication/xmlapplication/xhtml+xmlimage/svg+xml
How does the DOMParser interface parse HTML strings differently based on the mimeType argument?
The DOMParser interface parses HTML strings differently based on the mimeType argument. The mimeType argument determines whether the XML parser or the HTML parser is used to parse the string. The difference in parsing is that the XML parser is more strict and will return a parser error for invalid HTML, while the HTML parser is more lenient and will try to interpret the string as HTML even if it contains errors.
Practical example
See the check if a string is valid HTML using JavaScript
example. Enter some HTML into the textarea and activate the submit button to determine if the provided string is valid HTML.
Notice that different mime types give different results in the validation.
Code
Here are two version of the code: TypeScript and JavaScript. We also need to catch the errors.
When using the XML parser with a string that doesn’t represent well-formed XML, the XMLDocument returned by parseFromString will contain a <parsererror> node describing the nature of the parsing error.
The function isStringValidHtml returns an object with the following properties:
isParseErrorAvailable– a boolean that determines if<parsererror>element is available.trueindicates that for a given mime type, the string is valid.isStringValidHtml– a boolean that determines if a given string is valid HTML.parsedDocument– it contains the<parsererror>content or document when<parsererror>is not available.
public static isStringValidHtml(html: string, mimeType: string = 'application/xml'): { [key: string]: any } {
const domParser: DOMParser = new DOMParser();
const doc: Document = domParser.parseFromString(html, mimeType);
const parseError: Element | null = doc.documentElement.querySelector('parsererror');
const result: { [key: string]: any } = {
isParseErrorAvailable: parseError !== null,
isStringValidHtml: false,
parsedDocument: ''
};
if (parseError !== null && parseError.nodeType === Node.ELEMENT_NODE) {
result.parsedDocument = parseError.outerHTML;
} else {
result.isStringValidHtml = true;
result.parsedDocument = typeof doc.documentElement.textContent === 'string' ? doc.documentElement.textContent : '';
}
return result;
}function isStringValidHtml(html, mimeType) {
const domParser = new DOMParser();
const doc = domParser.parseFromString(html, typeof mimeType == 'string' ? mimeType : 'application/xml');
const parseError = doc.documentElement.querySelector('parsererror');
const result = {
isParseErrorAvailable: parseError !== null,
isStringValidHtml: false,
parsedDocument: ''
};
if (parseError !== null && parseError.nodeType === Node.ELEMENT_NODE) {
result.parsedDocument = parseError.outerHTML;
} else {
result.isStringValidHtml = true;
result.parsedDocument = typeof doc.documentElement.textContent === 'string' ? doc.documentElement.textContent : '';
}
return result;Example of validation error

What a MIME type is and why it’s used in the isStringValidHtml function?
In the context of the isStringValidHtml function, the MIME type is used to tell the DOMParser object what type of document to expect. When parsing a string, the parser needs to know the format of the string in order to parse it correctly. By specifying the MIME type, we give the parser this information. For HTML strings, the MIME type would typically be text/html or application/xhtml+xml. If the MIME type is not specified, it defaults to application/xml.
How does the DOMParser API handle invalid HTML syntax?
The DOMParser API in JavaScript handles invalid HTML syntax by attempting to parse the string and creating a HTMLDocument object. If the string is not well-formed HTML, the resulting HTMLDocument object might contain a <parsererror> node, which describes the nature of the parsing error.
The DOMParser API does not fix or correct the invalid HTML. It merely attempts to parse the string and reports any errors it encounters during parsing.
How can you ensure that HTML string validation in JavaScript correctly detects missing or misplaced tags?
There is no single HTML validator
built into the browser, but you can still mechanically verify that every start tag has a matching end tag andsemantically check that the browser did not silently rewrite your markup.
Below is a drop-in, framework-agnostic helper that does both in less than 60 lines of code.
1. Mechanical check – is every tag balanced?
We treat the string as if it were XML just long enough to count open/close pairs. Self-closing tags are ignored and everything else is pushed on a stack.
const VOID_CACHE = new Map();
/* Detects whether the current engine treats <tag>
Falls back to the HTML 5 spec list when DOM is not available.
*/
function isVoidElement(tagName) {
if (VOID_CACHE.has(tagName)) return VOID_CACHE.get(tagName);
/* ---------- server-side fallback ---------- */
if (typeof document === "undefined") {
// HTML 5 void elements
const voidSet = new Set([
"area",
"base",
"br",
"col",
"embed",
"hr",
"img",
"input",
"link",
"meta",
"param",
"source",
"track",
"wbr",
]);
const result = voidSet.has(tagName.toLowerCase());
VOID_CACHE.set(tagName, result);
return result;
}
/* ---------- Browser environment ---------- */
const ns = "http://www.w3.org/1999/xhtml";
try {
const elem = document.createElementNS
? document.createElementNS(ns, tagName)
: document.createElement(tagName);
const markup = window.XMLSerializer
? new XMLSerializer().serializeToString(elem)
: elem.outerHTML;
const isVoid = markup.includes("></") === false;
VOID_CACHE.set(tagName, isVoid);
return isVoid;
} catch {
// Invalid element name (e.g. capitalised SVG in XHTML)
VOID_CACHE.set(tagName, false);
return false;
}
}/**
* Returns null if every tag is balanced.
* Otherwise returns { tag, expected, found } describing the mismatch.
*/
export function findUnbalanced(html) {
const selfClosing = new Set([
'area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input',
'link', 'meta', 'param', 'source', 'track', 'wbr'
]);
const re = /<\s*([a-zA-Z][a-zA-Z0-9-]*)(?:\s[^>]*)?\s*>|<\/\s*([a-zA-Z][a-zA-Z0-9-]*)\s*>/g;
const stack = [];
let m;
while ((m = re.exec(html)) !== null) {
const startTag = m[1] ? m[1].toLowerCase() : null;
const endTag = m[2] ? m[2].toLowerCase() : null;
if (startTag) {
if (!isVoidElement(startTag)) {
stack.push(startTag);
}
} else if (endTag) {
if (!stack.length || stack.pop() !== endTag) {
return { tag: endTag, expected: stack.at(-1), found: endTag };
}
}
}
return stack.length
? { tag: stack[stack.length - 1], expected: null, found: stack[stack.length - 1] }
: null;
}Usage example:
const bad = '<div><p>text</div></p>';
console.log(findUnbalanced(bad));
// → { tag: 'div', expected: 'p', found: 'div' }2. Semantic check – did the browser mutate the DOM?
Even balanced HTML can be rewritten (tables get tbody, stray meta tags move to <head>, etc.). Parse the string twice and compare the final DOM. If the two serialisations differ, the browser fixed
something.
export function isDOMIntact(html) {
const dp = new DOMParser();
const doc1 = dp.parseFromString(html, 'text/html');
const doc2 = dp.parseFromString(html, 'text/html');
return doc1.documentElement.outerHTML === doc2.documentElement.outerHTML;
}3. One-line validator
Chain the two checks together and you get a tiny utility you can require from either the browser or Node.js (with JSDom).
export function validateHTML(html) {
const unbal = findUnbalanced(html);
if (unbal) {
return { valid: false, reason: 'Unbalanced tag', details: unbal };
}
if (isDOMIntact(html) === false) {
return { valid: false, reason: 'Browser auto-corrected the markup' };
}
return { valid: true };
}4. Server-side (Node.js) usage
import { JSDOM } from 'jsdom';
global.DOMParser = new JSDOM().window.DOMParser;
// now import and use validateHTML() exactly as in the browserSummary
Checking if a string is valid HTML requires balancing simplicity, accuracy, and performance. The method you use will be determined by your individual needs. For example, you could use regular expressions for the fast check, but this may result in false positives. More robust solutions, such as DOM parsing or node type checking, provide more accuracy but may have downsides, such as resource loading or temporary DOM manipulation. Consider the security implications of working with potentially harmful input.
Comments