I have an API which receives a string containing HTML code and stores it in a database. I'm using node-html-parser package to perform some logic on the HTML.
Among other things, I want to remove any potentially-malicious script. According to the documentation, the package should be able to do this when instructed via the options object (see 'Global Methods' heading in previous link).
My code:
const parser = require('node-html-parser');
const html = `<p>My text</p><script></script>`
const options = {
blockTextElements: {
script: false
}
}
const root = parser.parse(html, options)
return ({ html: root.innerHTML})
I tried modifying the options object with script: true, noscript: false, and noscript: true as well, but neither removed the script tags from the html.
Am I doing something wrong?
on*attributes in the HTML markup itself, among others. Dependent on how you're piecing the markup back together on the tail end, you may also still be very vulnerable to markup a la<scr<script>Ha!</script>ipt> alert(document.cookie);</script>(h/t to this SO thead). You really should re-evaluate mitigations for this type of attack dependend on your broader threat model.sanitize-htmlwhich is specifically geared to minimize or eliminate these potential attack vectors by allowing for the configuration of an explicit allow-list of HTML element types and attributes that fit your use case.