0

I'm using the following code on my WordPress install:

function add_glossary_links($content) {
    global $wpdb, $wordlist;
    if ( !$wordlist && !$wordlist = get_option('wordlist') ) {
        mysql_query('SET SESSION group_concat_max_len = 100000');
        $wordlist = $wpdb->get_var('SELECT GROUP_CONCAT(DISTINCT post_title SEPARATOR "|") AS list FROM '.$wpdb->posts.' WHERE post_status="publish" AND post_type="glossary" AND post_parent>0');
        add_option('wordlist', $wordlist);
    }

    $wordlist = str_replace(array(" ", "'", "."), array("\s", "\'", "\."), $wordlist);
    echo $wordlist;

    $content = preg_replace_callback(
        '/\b('.$wordlist.')\b/i',
        create_function(
            '$matches',
            'return "<a href=\"/glossary/" . strtolower(substr($matches[0],0,1) . "/" . $matches[0]) . "/\">" . $matches[0] . "</a>";'
        ),
        $content
    );

    return preg_replace('/(<[^<]+)<a\s.*?>(.*?)<\/a>/si','$1$2', $content);
}

add_filter( 'the_content', 'add_glossary_links' );

The idea is that I get a list of words from my database; if they exist I replace them with links to the appropriate glossary term.

$wordlist is echoing out as this: http://pastebin.com/6XnWBJwM

The error that I'm receiving is this:

Warning: preg_replace_callback(): Unknown modifier 'c' in /my.website/wp-content/themes/mytheme/functions.php on line 384

Line 384 is the last line of this segment:

$content = preg_replace_callback(
        '/\b('.$wordlist.')\b/i',
        create_function(
            '$matches',
            'return "<a href=\"/glossary/" . strtolower(substr($matches[0],0,1) . "/" . $matches[0]) . "/\">" . $matches[0] . "</a>";'
        ),
        $content
    );

I presume there's a problem with the formatting of the regexp and the way the wordlist is displaying but I can't for the life of me fathom it.

Thanks in advance,

2 Answers 2

1

You are getting this error because one of the words has a / in it, which is being interpreted as the end delimiter. Anything after is then interpreted as modifiers, and "c" isn't valid as one of those.

You should run the input through preg_quote(), however since you are concatenating the values in your query this won't work out of the box.

I suggest not using GROUP_CONCAT, instead getting each word on its own row. Then, take the rows and fill an array with the words. Last, use implode("|",array_map("preg_quote",$words,array_fill(0,count($words),"/"))) and put that in your regex.

Sign up to request clarification or add additional context in comments.

5 Comments

Note that preg_quote() without the second argument won't escape /.
I'm using GROUP_CONCAT because it was suggested to me for speed purposes - the list is derived from 1,750 rows in the database. Is there any other way around it without changing the SQL?
Well, you could just use str_replace("/","\\/",$wordlist), but bear in mind that you leave yourself open to injection if any of those words can be user-submitted (or any of them contain other regex-specific characters)
Glossary entries are written by author-level users in the admin area. Could I sanitise the data before it goes into the database, then run a str_replace as you're suggesting? Is there a list of other regex characters that would need escaping? Any idea what kind of speed impact a str_replace would have on ~1700 entries?
You can certainly run preg_quote on input data before it gets saved, yes. I would suggest running a single-use code that takes all the words, runs them through preg_quote, then puts the result back in the database.
1

You should run $wordlist through preg_quote().

$safeWordlist = implode('|', 
                    array_map(function($word) { return preg_quote($word, '/'); }, 
                    explode('|', $wordlist))
                       );

CodePad.

Don't roll your own escaping method :)

3 Comments

Hi Alex. Yeah I meant to remove the str_replace as it was just me mucking about trying to fix things. Will your code have an impact on speed when used with ~1750 entries in the array?
@dunc Possibly. You should also watch for the limit of the regular expression. It's limited to 65k from memory.
Any suggestions as to how I can get around this then? It needs to be as quick as possible really. More information available from my OP on WPSE: wordpress.stackexchange.com/questions/41667

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.