Regular expression works in Javascript but not PHP preg_match

Question

Regular expression:

/([^]+):([^\\r\\n]+)/

String:

f1:aaa\r\nf2:bbb\r\nf3:ccc\r\nf4:ddd

According to regexpal.com, this would give my desired sets: f1 & aaa, f2 & bbb, f3 & ccc etc. But using http://www.functions-online.com/preg_match.html I only see [0] => "f1" and [1] => "f1"

Can anyone show how I should be doing this?

@JasonMcCreary, and that's causing it to not work with preg_match? — Mike Perrenoud
– Mike Perrenoud, Commented Aug 29, 2013 at 12:30
That should cause it not to work anywhere… Surprised it works as expected in JavaScript. — Jason McCreary
– Jason McCreary, Commented Aug 29, 2013 at 12:31

Community · Accepted Answer · 2017-05-23 12:28:18Z

Some implementations of javascript allow [] and [^] as "no character" and "any character" respectively. But keep in mind that this is particular to the javascript regex flavour. (if your are interested by the subject you can take a look at this post.)

In other words [^] is a shortcut for [\s\S] since javascript doesn't have a dotall or singleline mode where the dot can match newlines.

Thus, to obtain the same result in PHP you must replace [^] by . (which by default matches any character except newline) with the singleline modifier s after the end delimiter or (?s) before the . to allow newlines too. Examples: /.+/s or /(?s).+/

But for your particular case this pattern seems to be more appropriate:

preg_match_all('~((?>[^rn\\\:]++|(?<!\\\)[rn])+):([^\\\]++)~', $subject, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    echo $match[1].' '.$match[2].'<br/>';
}

pattern explanation:

~                    # pattern delimiter
(                    # open the first capturing group
    (?>              # open an atomic group
        [^rn\\\:]++  # all characters that are not "r", "n", "\" or ":"
      |              # OR
        (?<!\\\)[rn] # "r" or "n" not preceded by "\"
    )+               # close the atomic group and repeat one or more times
)                    # close the first capturing group
:
(                    # open the second capturing group
    [^\\\]++         # all characters except "\" one or more times
)                    # close the second capturing group
~

Notices:

When you want to represent a \ (backslash) in a string surrounded by single quotes, you must use a double escape: \\\

The principe of this pattern is to use negative character classes and negative assertions, in other words it looks for what the desired substrings can not be.

The above pattern use atomic groups (?>...) and possessive quantifiers ++ in place of non-capturing group (?:...) and simple quantifiers +. It is the same except that the regex engine can't go back to test other ways when it fails with atomic groups and possessive quantifiers, since it doesn't record backtrack positions. You can win in performance with this kind of features.

+ for explaining [^], I was about to ask a separate question. Wish I could ++.

hsz · Accepted Answer · 2013-08-29 12:30:30Z

2

Try with:

/([a-z0-9]+):([a-z0-9]+)(?:\r\n)?/

or

/(\w+):(\w+)(?:\r\n)?/

answered Aug 29, 2013 at 12:30

hsz

153k63 gold badges268 silver badges320 bronze badges

1 Comment

iss42 Over a year ago

Thanks, also tested these at functions-online and they didn't quite work, not sure why

Toto · Accepted Answer · 2013-08-29 12:30:15Z

0

I think you need:

/([^:]+):([^\\r\\n]+)/
//__^ note the colon

answered Aug 29, 2013 at 12:30

Toto

91.7k63 gold badges97 silver badges135 bronze badges

1 Comment

iss42 Over a year ago

Thanks, tested this at functions-online and it didn't quite work, not sure why

Collectives™ on Stack Overflow

Regular expression works in Javascript but not PHP preg_match

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related