2

Regular expression:

/([^]+):([^\\r\\n]+)/

String:

f1:aaa\r\nf2:bbb\r\nf3:ccc\r\nf4:ddd

According to regexpal.com, this would give my desired sets: f1 & aaa, f2 & bbb, f3 & ccc etc. But using http://www.functions-online.com/preg_match.html I only see [0] => "f1" and [1] => "f1"

Can anyone show how I should be doing this?

3
  • 3
    You have an empty character set: [^]. Commented Aug 29, 2013 at 12:29
  • @JasonMcCreary, and that's causing it to not work with preg_match? Commented Aug 29, 2013 at 12:30
  • 3
    That should cause it not to work anywhere… Surprised it works as expected in JavaScript. Commented Aug 29, 2013 at 12:31

3 Answers 3

5

Some implementations of javascript allow [] and [^] as "no character" and "any character" respectively. But keep in mind that this is particular to the javascript regex flavour. (if your are interested by the subject you can take a look at this post.)

In other words [^] is a shortcut for [\s\S] since javascript doesn't have a dotall or singleline mode where the dot can match newlines.

Thus, to obtain the same result in PHP you must replace [^] by . (which by default matches any character except newline) with the singleline modifier s after the end delimiter or (?s) before the . to allow newlines too. Examples: /.+/s or /(?s).+/

But for your particular case this pattern seems to be more appropriate:

preg_match_all('~((?>[^rn\\\:]++|(?<!\\\)[rn])+):([^\\\]++)~', $subject, $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    echo $match[1].' '.$match[2].'<br/>';
}

pattern explanation:

~                    # pattern delimiter
(                    # open the first capturing group
    (?>              # open an atomic group
        [^rn\\\:]++  # all characters that are not "r", "n", "\" or ":"
      |              # OR
        (?<!\\\)[rn] # "r" or "n" not preceded by "\"
    )+               # close the atomic group and repeat one or more times
)                    # close the first capturing group
:
(                    # open the second capturing group
    [^\\\]++         # all characters except "\" one or more times
)                    # close the second capturing group
~

Notices:

When you want to represent a \ (backslash) in a string surrounded by single quotes, you must use a double escape: \\\

The principe of this pattern is to use negative character classes and negative assertions, in other words it looks for what the desired substrings can not be.

The above pattern use atomic groups (?>...) and possessive quantifiers ++ in place of non-capturing group (?:...) and simple quantifiers +. It is the same except that the regex engine can't go back to test other ways when it fails with atomic groups and possessive quantifiers, since it doesn't record backtrack positions. You can win in performance with this kind of features.

Sign up to request clarification or add additional context in comments.

1 Comment

+ for explaining [^], I was about to ask a separate question. Wish I could ++.
2

Try with:

/([a-z0-9]+):([a-z0-9]+)(?:\r\n)?/

or

/(\w+):(\w+)(?:\r\n)?/

1 Comment

Thanks, also tested these at functions-online and they didn't quite work, not sure why
0

I think you need:

/([^:]+):([^\\r\\n]+)/
//__^ note the colon

1 Comment

Thanks, tested this at functions-online and it didn't quite work, not sure why

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.