php preg_replace issue while replacing several words

Question

i have the following regular expression:

$patterns = array
(
    '/\b(gubalowka hegy)\b/i',
    '/\b(krakkó|wawel|wawelban|auschwitz|auschwitzba|auschwitz-birkenua)\b/i',
    '/\b(királyi|város|fogaskerekű|séta)\b/i',
);

$replaces = array
(
    '<strong>$1</strong>',
    '<u><em>$1</em></u>',
    '<strong>$1</strong>',
);

preg_replace($patterns, $replaces, $text);

The problem is, that only some words gets replaced.

By this example only these words:

Séta               => <strong>Séta</strong>
Krakkó             => <u><em>Krakkó</em></u>
királyi            => <strong>királyi</strong>
Auschwitz-Birkenua => <u><em>Auschwitz-Birkenua</em></u>

The other words stay untouched.

I tried to get it working several ways (replacing every word seperately, replacing group of words without arrays) but neither of them worked.

Here you can check it: http://adriaholiday.dev.webndev.hu/ajanlatok/lengyelorszagi-hetvege.html

The regular expressions get logged in chrome dev console

Could somebody help ? Thank you.

Edit:

If I write the regex, it works

$pattern = '/\b(krakkó|wawel|wawelban|auschwitz|auschwitzba)\b/iu'
$replace = '<strong><u>$1</u></strong>';
$text    = preg_replace($pattern, $replace, $text);

the issue appears only when the regex gets generated

$replace = '<strong>$1</strong>';

foreach (...)
{
    $words .= "|{$word}"; // first vertical bar removed ...
}

// encoding UTF8
// pattern: /\b(krakkó|wawel|wawelban|auschwitz|auschwitzba)\b/iu
$pattern = '/\b(' . $words . ')\b/iu';

$text = preg_replace($pattern, $replace, $text);

This could be an issue with ascii vs unicode - I haven't done much with php regex for a long time but that might be an area to look. How well does php handle case insensitive unicode strings? — David Mason
– David Mason, Commented Feb 29, 2012 at 9:54
add u to your modifiers (/foo/iu) to tell PCRE to treat the pattern as UTF-8. see php.net/manual/en/reference.pcre.pattern.modifiers.php — rodneyrehm
– rodneyrehm, Commented Feb 29, 2012 at 10:03
after concatenation, $word has an wrong remaining pipe!!! also, as other mention, "u" modifier is needed — Saic Siquot
– Saic Siquot, Commented Feb 29, 2012 at 11:22
the $pattern variable contains: /\b(krakkó|wawel|wawelban|auschwitz|auschwitzba)\b/iu — csanyigabor
– csanyigabor, Commented Feb 29, 2012 at 11:49

Community · Accepted Answer · 2017-05-23 11:48:01Z

1

Check if mbstring and mbregex is available. PHP's default type, ISO-8859-1 does not include ő, ű , Ő and Ű and other special chars (but I'm assuming you'll only need these). UTF-8 does, but you'll have to use multibyte functions with that.

To read more on mbstring, look at PHP documentation. It inclueds mb_ereg_replace as well.

EDIT: I found out that with the u flag, preg_repace can use UTF-8 as well. Take a look at this question.

edited May 23, 2017 at 11:48

CommunityBot

11 silver badge

answered Feb 29, 2012 at 9:58

axiomer

2,1261 gold badge17 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

csanyigabor Over a year ago

I tried it with the u flag, but it doesn't change anything, i try now with mb_ereg_replace

Collectives™ on Stack Overflow

php preg_replace issue while replacing several words

Edit:

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Edit:

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related