I have a form where users are able to style their own input with html. I want to clean that input on the server side with PHP. However, I want to make sure that all the input is secure and matches what I would like it to be. I already have XSS protection so this is not about removing scripts.
When the user provides input, I want to remove tags other than p, img, a, hr, br, tbody, tr, td, pre, ul, ol, li and span (basically all text formatting other than divs). I want to remove any attributes other than href for <a>, src for <img>, and style for <p>. For <p> style I would only like to preserve the following attributes:
colorbackground-colorline-height- Anything that starts with
text-
In addition, I want to be able to crop the text to a certain length while preserving ending tags and making sure that every opening tag also has a closing tag.
For example, how does the Stack Overflow editor parse and clean input before saving it and displaying it to the user?
Thanks.