1

I wrote a simple script below to simulate my problem. Both my string and pattern contain unicode characters.

Basically, if I run it from command line (php -f test.php), it prints "match" as expected. But if I run it through web server (apache, http://localhost/test.php), it prints "no match". I am using PHP 5.3.

Any idea why it behaves differently? How do I make it work through web server?

thanks.

<?php
function myCallback($matches) {
    return $matches[0];
}

$value = 'aaa äää';
$pattern = '/(\bäää)/u';

$value = preg_replace_callback($pattern, 'myCallback', $value, -1, $count);
if ($count > 0) {
    echo "match";
} else {
    echo 'no match';
}
?>
2
  • Do you send headers to the server? Commented Mar 29, 2012 at 21:21
  • No header. I entered the url to the browser address bar manually. The string is not user input, it's hardcoded in php script. Commented Mar 29, 2012 at 21:42

2 Answers 2

1

Try changing default_charset using iniset('default_charset','utf-8').

If it works, it means that CLI and Apache PHP configs have separate php.ini configurations and perhaps this variable is set differently, or maybe based on environment.

You can leave that in as a solution or find an alternative.

Cheers,

Dan

Sign up to request clarification or add additional context in comments.

1 Comment

Adding init_set does not help. I am pretty sure they are using the same php.ini.
0

Check your test.php for it to have the correct headers. In PHP you should state:

header('Content-Type: text/html; charset=utf-8'); 

As in your HTML head:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> 

As standard it is set to ISO-8895-1 and maybe that is causing the problem. Here you can find some more information about multiple encodings (if utf-8 encoding is not acceptable) and about utf-8 self: http://devlog.info/2008/08/24/php-and-unicode-utf-8/

3 Comments

The input string does not come from user, it is hardcoded in php. And I am not displaying any unicode in browser. So I am not sure why it has anything to do with headers. But in my response, yes it does have content-type charaset set to 'utf-8'.
I didn't know, maybe try mb_internal_encoding then

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.