PHP regex pattern not working

Question

I'm facing a very strange error with regex in php. My Pattern is /\[B\]\[SIZE=3\](Trama|Recensione:|Curiosità|Trama:)\[\/SIZE\]\[\/B\](.*?)\[B\]\[SIZE=3\]/is

And it works with "Trama", "Recensione:", and "Trama:", but not with "Curiosità" in my script. The strange thing is that if i type this pattern here, it matches all correctly. What am I doing wrong?

My script:

$query = $db->query("SELECT `t`.`threadid`, `t`.`title`, `t`.`firstpostid`, `t`.`dateline`, `f`.`parentid` FROM {$db->tabelle['topic']} AS t, {$db->tabelle['forum']} AS f WHERE `f`.`forumid` = `t`.`forumid` AND `f`.`parentid` = ". (SEZIONE_RECENSIONI) ." AND `visible` = 1 ORDER BY `dateline` DESC LIMIT 10");
        while($thread = $db->fetch_array($query))
        {
            $post = $db->fetch_array($db->query("SELECT `pagetext`, `userid` FROM {$db->tabelle['post']} WHERE `postid` = {$thread['firstpostid']}"));

            $pattern = "/\[cover\](.*?)\[\/cover\]/is";
            preg_match($pattern, $post['pagetext'], $cover);

            $pattern = '/\[B\]\[SIZE=3\](Trama|Recensione:|Curiosità|Trama:)\[\/SIZE\]\[\/B\](.*?)\[B\]\[SIZE=3\]/isU';
            preg_match($pattern, $post['pagetext'], $trama);
            $content = remove_bbcode($parser->parse(truncate(utf8_encode($trama[2]), 350, '...', false, true)));
            $page .= "<li>
            <div class=\"recensione\" style=\"background: url(".$cover[1].") no-repeat; background-size: cover; background-position: 20% center; \">
                <p class=\"recensione_titolo\"><a href=\"?rec={$thread['threadid']}\">{$thread['title']}</a></p>
                <p class=\"recensione_content\">{$content} <a href=\"?rec={$thread['threadid']}\"><em>Continua a leggere</em></a></p>
            </div>
        </li>";
        }

Try adding /U flag to make it /isU

anubhava
– anubhava

2014-09-06 11:43:13 +00:00
Commented Sep 6, 2014 at 11:43 — anubhava
– anubhava, Commented Sep 6, 2014 at 11:43

Casimir et Hippolyte · Accepted Answer · 2014-09-06 12:21:35Z

2

It can be an UTF8 problem, you can try to inform the regex engine that the target string must be read as an utff8 string. To do that you can add (*UTF8) at the begining or you can use the u modifier:

$pattern = '~(*UTF8)\[B]\[SIZE=3](Trama:?|Recensione:|Curiosità)\[/SIZE]\[/B](.*?)\[B]\[SIZE=3]~s';

or

$pattern = '~\[B]\[SIZE=3](Trama:?|Recensione:|Curiosità)\[/SIZE]\[/B](.*?)\[B]\[SIZE=3]~su';

Note: to avoid a lot of backslashes in your expression, to make it more readable:

you can change the pattern delimiter, (no need escape slashes)
the literal closing bracket doesn't need to be escaped.
you can use \Q and \E to quote literal substring
you can use the freespacing mode x

example:

$pattern = '~
    \Q[B][SIZE=3]\E
    (Trama:?|Recensione:|Curiosità)
    \Q[/SIZE][/B]\E   (.*?)  \Q[/SIZE][/B]\E ~xus';

edited Sep 6, 2014 at 12:21

answered Sep 6, 2014 at 12:12

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

DavideR Over a year ago

Ok, it's a UTF8 problem, but if I use the u modifier it completely stops working.

Casimir et Hippolyte Over a year ago

@DavideR: try to determine what is the encoding of the original text, and convert it to utf8. (in particular, take a look at the default encoding in your code editor)

DavideR Over a year ago

Ok, I thing I'm going to replace à, è, ì and any other special character with a, e, i and so on. Thank you for your suggestions.

Collectives™ on Stack Overflow

PHP regex pattern not working

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related