0

I was trying to find a pattern for the following scenario:

Lets say i have this string:

someString[code]some code[/code]someString

Now some code can be anything, What i want to get is reserved words (break, class, etc), So for a real scenario this is a string:

someString
[code]
class someClass{}
[/code]
someString

// And again

someString
[code]
class someClass{}
[/code]
someString

So what i was trying to understand is how can i match all the reserved words that between all of the [code][/code] tags.

For example: [code]someReservedWord some text anotherReservedWord[/code] I only want to match someReservedWord and anotherReservedWord.

I was thinking to use preg_match_all So i can get all reserved words inside each [code][/code] and use PREG_OFFSET_CAPTURE to get their positions,

The only thing i can't figure out is the pattern, if anyone got idea i will be very thankful, Thank you all and have a nice day.

2 Answers 2

3

You can use this:

$pattern = <<<'LOD'
~ (?(DEFINE) (?<words> class | string | function ) )

(?: \[code] | \G(?<!^) )
(?: [^[]+? | \[(?!/code]) )*? \K
\b \g<words> \b

~x
LOD;

preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);

print_r($matches[0]);

pattern details:

First at all we define a named group with all reserved words:

(?(DEFINE) (?<words> class | string | function ) )

The (?(DEFINE)...) syntax allows to define subpatterns out of the pattern itself. You can call the named group "words" later in the pattern with \g<words>.

(?: [^[]+? | \[(?!/code]) )*? describes all the content before a reserved word. This subpattern can match all except the closing tag [/code] because you have the choice between "all that is not a [" or "a [ not followed by /code". Since it can match all, lazy quantifiers are used to stop the match when a reserved word is encountered.

The entry point of the pattern is (?: \[code] | \G(?<!^) ). This enforce the match to begin with a [code] tag or to be contiguous to a precedent match.

(\G is an anchor that means: "at the start of the string or contiguous to a precedent match". With the negative lookbehind (?<!^), you forbid the start of the string.)

\K is a trick that resets all the matched content before it from the match result.

Sign up to request clarification or add additional context in comments.

Comments

0
$str = "someString[code]some code[/code]someString";
$ret = preg_replace('#\[code\](.+)\[\/code\]#iUs', '<FOUND>$1</FOUND>', $str);
var_dump($ret);

(http://www.phpliveregex.com/p/2tD , see preg_match_all example)

You'll might google for BB-Code PHP regex.

1 Comment

I don't want all characters only specific ones as class|break|this but not (.+)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.