2

i have the following regular expression:

$patterns = array
(
    '/\b(gubalowka hegy)\b/i',
    '/\b(krakkó|wawel|wawelban|auschwitz|auschwitzba|auschwitz-birkenua)\b/i',
    '/\b(királyi|város|fogaskerekű|séta)\b/i',
);

$replaces = array
(
    '<strong>$1</strong>',
    '<u><em>$1</em></u>',
    '<strong>$1</strong>',
);

preg_replace($patterns, $replaces, $text);

The problem is, that only some words gets replaced.

By this example only these words:

Séta               => <strong>Séta</strong>
Krakkó             => <u><em>Krakkó</em></u>
királyi            => <strong>királyi</strong>
Auschwitz-Birkenua => <u><em>Auschwitz-Birkenua</em></u>

The other words stay untouched.

I tried to get it working several ways (replacing every word seperately, replacing group of words without arrays) but neither of them worked.

Here you can check it: http://adriaholiday.dev.webndev.hu/ajanlatok/lengyelorszagi-hetvege.html

The regular expressions get logged in chrome dev console

Could somebody help ? Thank you.

Edit:

If I write the regex, it works

$pattern = '/\b(krakkó|wawel|wawelban|auschwitz|auschwitzba)\b/iu'
$replace = '<strong><u>$1</u></strong>';
$text    = preg_replace($pattern, $replace, $text);

the issue appears only when the regex gets generated

$replace = '<strong>$1</strong>';

foreach (...)
{
    $words .= "|{$word}"; // first vertical bar removed ...
}

// encoding UTF8
// pattern: /\b(krakkó|wawel|wawelban|auschwitz|auschwitzba)\b/iu
$pattern = '/\b(' . $words . ')\b/iu';

$text = preg_replace($pattern, $replace, $text);
5
  • This could be an issue with ascii vs unicode - I haven't done much with php regex for a long time but that might be an area to look. How well does php handle case insensitive unicode strings? Commented Feb 29, 2012 at 9:54
  • add u to your modifiers (/foo/iu) to tell PCRE to treat the pattern as UTF-8. see php.net/manual/en/reference.pcre.pattern.modifiers.php Commented Feb 29, 2012 at 10:03
  • i tried "u" already, it doesn't help Commented Feb 29, 2012 at 10:08
  • after concatenation, $word has an wrong remaining pipe!!! also, as other mention, "u" modifier is needed Commented Feb 29, 2012 at 11:22
  • the $pattern variable contains: /\b(krakkó|wawel|wawelban|auschwitz|auschwitzba)\b/iu Commented Feb 29, 2012 at 11:49

1 Answer 1

1

Check if mbstring and mbregex is available. PHP's default type, ISO-8859-1 does not include ő, ű , Ő and Ű and other special chars (but I'm assuming you'll only need these). UTF-8 does, but you'll have to use multibyte functions with that.

To read more on mbstring, look at PHP documentation. It inclueds mb_ereg_replace as well.

EDIT: I found out that with the u flag, preg_repace can use UTF-8 as well. Take a look at this question.

Sign up to request clarification or add additional context in comments.

1 Comment

I tried it with the u flag, but it doesn't change anything, i try now with mb_ereg_replace

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.