PHP preg_match_all capture all patterns at front of string not mid string

Question

given the subject

AB: CD:DEF: HIJ99:message packet - no capture

I have crafted the following regex to capture correctly the 2-5 character targets which are all followed by a colon.

/\s{0,1}([0-9a-zA-Z]{2,5}):\s{0,1}/

which returns my matches even if erronious spaces are added before or after the targets

[0] => AB
[1] => CD
[2] => DEF
[3] => HIJ99

However, if the message packet contains a colon in it anywhere, for example

AB: CD:DEF: HIJ99:message packet no capture **or: this either**

it of course includes [4] => or in the resulting set, which is not desired. I want to limit the matches to a consecutive set from the beginning, then once we lose concurrency, stop looking for more matches in the remainder

Edit 1: Also tried ^(\s{0,1}([0-9a-zA-Z]{2,5}):\s{0,1}){1,5} to force checking from the beginning of the string for multiple matches, but then I lose the individual matches

[0] => Array
    (
        [0] => AB: CD:DEF: HIJ99:
    )

[1] => Array
    (
        [0] => HIJ99:
    )

[2] => Array
    (
        [0] => HIJ99
    )

Edit 2: keep in mind the subject is not fixed.

AB: CD:DEF: HIJ99:message packet - no capture

could just as easily be

ZY:xw:VU:message packet no capture or: this either

for the matches we are trying to pull, with the subject being variable as well. Just trying to filter out the chance of matching a ":" in the message packet

Would it be enough to just edit the regex to match upper case character only? /\s{0,1}([0-9A-Z]{2,5}):\s{0,1}/ — t.h3ads
– t.h3ads, Commented Nov 18, 2014 at 16:10
define a repetition for your pattern: /(\s?([0-9a-zA-Z]{2,5}):\s?){0,4}/ for 0 to 4 repetitions — SBH
– SBH, Commented Nov 18, 2014 at 16:20
@SBH - that adds all sorts of mess, but did an edit with your idea. It is possible that there will be more matches than 0-4 though, and if the number is less, it will still grab the inline match — Dave
– Dave, Commented Nov 18, 2014 at 16:36

Avinash Raj · Accepted Answer · 2014-11-18 16:59:19Z

1

You could use \G to do a consecutive string match.

$str = 'AB: CD:DEF: HIJ99:message packet no capture or: this either';
preg_match_all('/\G\s*([0-9a-zA-Z]{2,5}):\s*/', $str, $m);
print_r($m[1]);

Output:

Array
(
    [0] => AB
    [1] => CD
    [2] => DEF
    [3] => HIJ99
)

DEMO

edited Nov 18, 2014 at 16:59

answered Nov 18, 2014 at 16:37

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dave Over a year ago

Brilliant... first time using \G

Dave Over a year ago

final edit (just in case a leading space) \G\s*([0-9a-zA-Z]{2,5}):\s* much obliged

Toto · Accepted Answer · 2014-11-18 16:23:45Z

1

How about:

$str = 'AB: CD:DEF: HIJ99:message packet no capture or: this either';
preg_match_all('/(?<![^:]{7})([0-9a-zA-Z]{2,5}):/', $str, $m);
print_r($m);

Output:

Array
(
    [0] => Array
        (
            [0] => AB:
            [1] => CD:
            [2] => DEF:
            [3] => HIJ99:
        )

    [1] => Array
        (
            [0] => AB
            [1] => CD
            [2] => DEF
            [3] => HIJ99
        )

)

answered Nov 18, 2014 at 16:23

Toto

91.7k63 gold badges97 silver badges135 bronze badges

3 Comments

Dave Over a year ago

am getting Compilation failed: lookbehind assertion is not fixed length at offset 11 when I drop that code in. disecting it

Dave Over a year ago

I think I see where you are going with that, but keep in mind the incoming subject could just as easily be ZY:xw:VU:message packet no capture or: this either

Toto Over a year ago

@Dave: that's strange, the lookbehind is fixed length here, and it works fine for me.

Collectives™ on Stack Overflow

PHP preg_match_all capture all patterns at front of string not mid string

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related