PHP html_entity_decode() exception for numeric HTML entities 60 and 62

Question

How do I use PHP's html_entity_decode() with an exception for numeric HTML entities 60 and 62?

Currently my code looks something like the following:

$t = mysqli_real_escape_string($db,html_entity_decode($_POST['title'],ENT_COMPAT,'UTF-8'));

However if I have that are encoded to display as carets in content (just as you would display an ampersand directly to a client) they too become encoded and this has led to malformed HTML. So I need to make some sort of exception though I'm not sure how to do this; string replacement with a temporary placeholder? I'm sure there is a better way.

What is the purpose of 'decoding' the posted value? It seems like .. something is wrong with doing such. Normally an HTML input field will not 'encode' any values. — user2864740
– user2864740, Commented Sep 7, 2015 at 1:05
I support many different non-Latin languages and client browsers, PHP and everything else in the mix jump at every oppertunity to destroy HTML entities so when pages are edited I convert all characters over 127 in to numeric HTML entities which keeps them safe...when putting them in to the database the length becomes an issue however SQL properly supports Unicode/UTF-8 so this is the last step to ensure the client sees what the client needs to. :-) — John
– John, Commented Sep 7, 2015 at 1:13
I don't see how html_entity_decode is designed to handle such (or can handle it correctly). — user2864740
– user2864740, Commented Sep 7, 2015 at 1:15
@user2864740 When you have an entity like < and > (the < and > caret characters) if you convert them to regular characters then there is zero distinction in the system how to convert them back so they must remain encoded when going in to the database; never allow code to be stored subjective to requiring human interpretation because websites aren't manually served by humans, they're automatically served by servers and software. — John
– John, Commented Sep 7, 2015 at 1:18
Also, < and > are better called angle brackets. ^ is a caret (and is unaffected by HTML encoding or decoding). — user2864740
– user2864740, Commented Sep 7, 2015 at 1:28

Community · Accepted Answer · 2017-03-20 10:29:30Z

1

Tentative answer, since this might be an XY-problem:
After resolving the html entities you can "re-encode" those characters that could hurt your html structure via htmlspecialchars.

$t = mysqli_real_escape_string(
    $db,
    htmlspecialchars(
        html_entity_decode($_POST['title'],ENT_COMPAT,'UTF-8'),
        'UTF-8'
    )
);

edited Mar 20, 2017 at 10:29

CommunityBot

11 silver badge

answered Sep 7, 2015 at 1:04

VolkerK

96.3k20 gold badges169 silver badges232 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

John Over a year ago

I think the problem there is that PHP will have zero clue or possibility to understand which carets are to be encoded and which should be entities. So unless there is a built in exception or different function I can use I should probably create my own function that temporarily does string replacement before and after using html_entity_decode().

VolkerK Over a year ago

Or you apply the encoding not when storing the values but on output. If that has a negative effect on the performance -> caching ;-)

John Over a year ago

Unfortunately that would be a double-negative for me. While encoded as numeric entities the very intentional limitations of the SQL fields such as for meta descriptions don't compensate for numeric HTML entities and I really can't afford the slightest shred of time to adjust that plus it makes little sense. Additionally every time the editor switches to HTML mode (from visual) JavaScript converts all the characters over code 127 to entities any way. I'm not seeing an alternative to the double string replacement that I was hoping for; I just want to do what my clients want though. :-)

VolkerK Over a year ago

no no, I only meant the encoding, not the decoding: html_entity_decode() without caring about < and/or >, i.e. storing < and > as-is, but on the outside apply htmlspecialchars().

John Over a year ago

I tried that and it deletes the string content from the variable. :-\

|

Collectives™ on Stack Overflow

PHP html_entity_decode() exception for numeric HTML entities 60 and 62

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related