2

I have user input and use htmlentities() to convert all entities. However, there seems to be some bug. When I type in

ääää öööö üüüü ääää

I get

ääää öööö üüüü ääää

Which looks like this

ääää öööö üüüü ääää

What am I doing wrong? The code is really only this:

$post=htmlentities($post);

EDIT 1

Here is some more code that I use for formatting purposes (there are some helpful functions it them):

    //Secure with htmlentities (mysql_real_escape_string() comes later)
    $post=htmlentities($post);

    //Strip obsolete white spaces
    $post = preg_replace("/ +/", " ", $post);

    //Detect links
    $pattern_url='~(?>[a-z+]{2,}://|www\.)(?:[a-z0-9]+(?:\.[a-z0-9]+)?@)?(?:(?:[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])(?:\.[a-z](?:[a-z0-9]|(?<!-)-)*[a-z0-9])+|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))(?:/[^\\/:?*"<>|\n]*[a-z0-9])*/?(?:\?[a-z0-9_.%]+(?:=[a-z0-9_.%:/+-]*)?(?:&[a-z0-9_.%]+(?:=[a-z0-9_.%:/+-]*)?)*)?(?:#[a-z0-9_%.]+)?~i';
    preg_match_all($pattern_url, $post, $matches); 
    for ($i=0; $i < count($matches[0]); $i++)
    {
        if(substr($matches[0][$i],0,4)=='www.')
        $post = str_replace($matches[0][$i],'http://'.$matches[0][$i],$post);
    }
    $post = preg_replace($pattern_url,'<a target="_blank" href="\\0">\\0</a>',$post);

    //Keep line breaks (more than one will be stripped above)
    $post=nl2br($post);

    //Remove more than one linebreak
    $post=preg_replace("/(<br\s*\/?>\s*)+/", "<br/>", $post);

    //Secure with mysql_real_escape_string()
    $post=mysql_real_escape_string($post);
2
  • 1
    When you say "really only this" can you share the rest of it? I don't see anything wrong with your PHP so the problem may lie somewhere else. Commented May 10, 2012 at 17:11
  • @stevether Please see questions edit. Commented May 10, 2012 at 17:18

2 Answers 2

7

You must manually specify the encoding (UTF-8) for htmlentities():

echo htmlentities("ääää öööö üüüü ääää", null, "UTF-8");

Output:

ääää öööö üüüü ääää
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! That is what I needed. What does the parameter null do? And do you maybe know why I needed this? I usually just use htmlemtities('string') without any additional params and it normally works fine.
It just says use the default for the second argument. args 2 and 3 are optional, but if you want to specify the 3rd arg, you have to specify the 2nd. would be equivalent to htmlentities("string", ENT_COMPAT | ENT_HTML401, "UTF-8")
2

it is important that 3th parameter of htmlentities matches the character set that uses the post. I supouse, you are NOT submiting utf8, as it is the default in htmlentities

in PHP

 $post = htmlentities ( $post, ENT_COMPAT, 'ISO-8859-1')  // or whatever  

in Form

 <form action="your.php" accept-charset="ISO-8859-1">

anyway, actualy I recommend you to use utf8

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.