2

given the subject

AB: CD:DEF: HIJ99:message packet - no capture

I have crafted the following regex to capture correctly the 2-5 character targets which are all followed by a colon.

/\s{0,1}([0-9a-zA-Z]{2,5}):\s{0,1}/

which returns my matches even if erronious spaces are added before or after the targets

[0] => AB
[1] => CD
[2] => DEF
[3] => HIJ99

However, if the message packet contains a colon in it anywhere, for example

AB: CD:DEF: HIJ99:message packet no capture **or: this either**

it of course includes [4] => or in the resulting set, which is not desired. I want to limit the matches to a consecutive set from the beginning, then once we lose concurrency, stop looking for more matches in the remainder

Edit 1: Also tried ^(\s{0,1}([0-9a-zA-Z]{2,5}):\s{0,1}){1,5} to force checking from the beginning of the string for multiple matches, but then I lose the individual matches

[0] => Array
    (
        [0] => AB: CD:DEF: HIJ99:
    )

[1] => Array
    (
        [0] => HIJ99:
    )

[2] => Array
    (
        [0] => HIJ99
    )

Edit 2: keep in mind the subject is not fixed.

AB: CD:DEF: HIJ99:message packet - no capture

could just as easily be

ZY:xw:VU:message packet no capture or: this either

for the matches we are trying to pull, with the subject being variable as well. Just trying to filter out the chance of matching a ":" in the message packet

4
  • Would it be enough to just edit the regex to match upper case character only? /\s{0,1}([0-9A-Z]{2,5}):\s{0,1}/ Commented Nov 18, 2014 at 16:10
  • define a repetition for your pattern: /(\s?([0-9a-zA-Z]{2,5}):\s?){0,4}/ for 0 to 4 repetitions Commented Nov 18, 2014 at 16:20
  • @typoheads could be upper or lower Commented Nov 18, 2014 at 16:25
  • @SBH - that adds all sorts of mess, but did an edit with your idea. It is possible that there will be more matches than 0-4 though, and if the number is less, it will still grab the inline match Commented Nov 18, 2014 at 16:36

2 Answers 2

1

You could use \G to do a consecutive string match.

$str = 'AB: CD:DEF: HIJ99:message packet no capture or: this either';
preg_match_all('/\G\s*([0-9a-zA-Z]{2,5}):\s*/', $str, $m);
print_r($m[1]);

Output:

Array
(
    [0] => AB
    [1] => CD
    [2] => DEF
    [3] => HIJ99
)

DEMO

Sign up to request clarification or add additional context in comments.

2 Comments

Brilliant... first time using \G
final edit (just in case a leading space) \G\s*([0-9a-zA-Z]{2,5}):\s* much obliged
1

How about:

$str = 'AB: CD:DEF: HIJ99:message packet no capture or: this either';
preg_match_all('/(?<![^:]{7})([0-9a-zA-Z]{2,5}):/', $str, $m);
print_r($m);

Output:

Array
(
    [0] => Array
        (
            [0] => AB:
            [1] => CD:
            [2] => DEF:
            [3] => HIJ99:
        )

    [1] => Array
        (
            [0] => AB
            [1] => CD
            [2] => DEF
            [3] => HIJ99
        )

)

3 Comments

am getting Compilation failed: lookbehind assertion is not fixed length at offset 11 when I drop that code in. disecting it
I think I see where you are going with that, but keep in mind the incoming subject could just as easily be ZY:xw:VU:message packet no capture or: this either
@Dave: that's strange, the lookbehind is fixed length here, and it works fine for me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.