2

i'm getting nuts with an regular expression. I have a string like this:

%closed% closed (%closed_percent%\%), %open% open (%open_percent%\%)

What I need is a regular expression that matches the following:

%closed%
%closed_percent%
%open%
%open_percent%

but not the two \%

At the moment I use:

\%([^\%]+)\%

that gives me:

%closed%
%closed_percent%
%), %
% open (%
...

Anyone can help me with that?

5 Answers 5

2

The simple way:

%\w+%

Matches: %foo%

Allows (multiple) backslash escapes:

(?<!\\)(?:\\.)*%(\w+)%

Matches only bar in: \%foo% \\%bar% \\\%baz%

...and this allows escapes inside of it too:

(?<!\\)(?:\\.)*%((?:[^\\%\s]+|\\.)+)%

Matches: %foo\%bar%

Use the value of the first capturing group with the last two expressions.

Sign up to request clarification or add additional context in comments.

2 Comments

Nice trick. I had to change [^\%\s] to [^\\%\s] to get it to work - you want a literal backslash, most flavors will see \% and ignore the backslash, as if you wrote %. By the way, why are everyone escaping the present sign? Does it mean anything in any flavor?
@Kobi, in Perl it's a sign for a hash variable, which (I guess) could get interpolated. You have nice built in hashes like %+ %-. Same goes for @.
1

Try this:

\%([^(\\\%)]+?)\%

matches

%closed%
%closed_percent%
%open%
%open_percent%

for me.

Comments

1

Assuming no restrictions on what can be in the percent wrapped tokens (including escaped characters), and what characters can be escaped (so backslashes can also be escaped: \\%token% should be valid),
here's a pattern you can use to skip over escaped characters:

\\.|(%([^%\\]|\\.)+%)

This will capture the percent-wrapped tokens, and will capture them in the first group ($1). Escaped characters will also be matched (it's a nice trick to skip over them), but using PHP it is very easy to get just the relevant tokens:

preg_match_all('/\\\\.|(%([^%\\\\]|\\\\.)+%)/', $str, $matches, PREG_PATTERN_ORDER);
$matches = array_filter($matches[1]);

Working example: http://ideone.com/dziCB

Comments

0

Try:

~\%\w+\%~

So, allow only a-z A-Z and _ in your selection.

$str = "%closed% closed (%closed_percent%\%), %open% open (%open_percent%\%)";

preg_match_all("~\%\w+\%~", $str, $matches);

$matches now contains:

Array
(
    [0] => Array
    (
        [0] => %closed%
        [1] => %closed_percent%
        [2] => %open%
        [3] => %open_percent%
    )
)

Comments

0

Add negative lookbehinds for the backslashes! That way \% is ignored, as intended.

(?<!\\)\%([^\%]+)(?<!\\)\%

Matches

%closed%

%closed_percent%

%open%

%open_percent%

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.