4

How do I use PHP's html_entity_decode() with an exception for numeric HTML entities 60 and 62?

Currently my code looks something like the following:

$t = mysqli_real_escape_string($db,html_entity_decode($_POST['title'],ENT_COMPAT,'UTF-8'));

However if I have that are encoded to display as carets in content (just as you would display an ampersand directly to a client) they too become encoded and this has led to malformed HTML. So I need to make some sort of exception though I'm not sure how to do this; string replacement with a temporary placeholder? I'm sure there is a better way.

9
  • 1
    What is the purpose of 'decoding' the posted value? It seems like .. something is wrong with doing such. Normally an HTML input field will not 'encode' any values. Commented Sep 7, 2015 at 1:05
  • I support many different non-Latin languages and client browsers, PHP and everything else in the mix jump at every oppertunity to destroy HTML entities so when pages are edited I convert all characters over 127 in to numeric HTML entities which keeps them safe...when putting them in to the database the length becomes an issue however SQL properly supports Unicode/UTF-8 so this is the last step to ensure the client sees what the client needs to. :-) Commented Sep 7, 2015 at 1:13
  • I don't see how html_entity_decode is designed to handle such (or can handle it correctly). Commented Sep 7, 2015 at 1:15
  • @user2864740 When you have an entity like &#60; and &#62; (the < and > caret characters) if you convert them to regular characters then there is zero distinction in the system how to convert them back so they must remain encoded when going in to the database; never allow code to be stored subjective to requiring human interpretation because websites aren't manually served by humans, they're automatically served by servers and software. Commented Sep 7, 2015 at 1:18
  • 1
    Also, < and > are better called angle brackets. ^ is a caret (and is unaffected by HTML encoding or decoding). Commented Sep 7, 2015 at 1:28

1 Answer 1

1

Tentative answer, since this might be an XY-problem:
After resolving the html entities you can "re-encode" those characters that could hurt your html structure via htmlspecialchars.

$t = mysqli_real_escape_string(
    $db,
    htmlspecialchars(
        html_entity_decode($_POST['title'],ENT_COMPAT,'UTF-8'),
        'UTF-8'
    )
);
Sign up to request clarification or add additional context in comments.

6 Comments

I think the problem there is that PHP will have zero clue or possibility to understand which carets are to be encoded and which should be entities. So unless there is a built in exception or different function I can use I should probably create my own function that temporarily does string replacement before and after using html_entity_decode().
Or you apply the encoding not when storing the values but on output. If that has a negative effect on the performance -> caching ;-)
Unfortunately that would be a double-negative for me. While encoded as numeric entities the very intentional limitations of the SQL fields such as for meta descriptions don't compensate for numeric HTML entities and I really can't afford the slightest shred of time to adjust that plus it makes little sense. Additionally every time the editor switches to HTML mode (from visual) JavaScript converts all the characters over code 127 to entities any way. I'm not seeing an alternative to the double string replacement that I was hoping for; I just want to do what my clients want though. :-)
no no, I only meant the encoding, not the decoding: html_entity_decode() without caring about < and/or >, i.e. storing < and > as-is, but on the outside apply htmlspecialchars().
I tried that and it deletes the string content from the variable. :-\
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.