1

I would like to capture each of these in their own group with preg_match_all in PHP.

  1. The chapter, section, or page
  2. The number (or letter if it has one) of the specified chapter, section, or page. If there is a single space between them it should be taken into account
  3. The words "and", "or"

Keeping in mind that the number of items in the string may be dynamic, the regex should work on all the examples below:

  1. Ch1 and Sect2b
  2. Ch 4 x blahunwantedtext and Sect 5y and Sect6 z and Ch7 or Ch8

This is what I managed to come up with so far:

<?php

    $str = 'Ch 1 a and Sect 2b and Pg3';
    preg_match_all ('/([a-z]+)([\s]?[0-9]+)([\s]?[a-z]*)([\s]?and*[\s]?)/is', $str, $matches);

    Array
    (
        [0] => Array
            (
                [0] => Ch 1 a and 
                [1] => Sect 2b and 
            )

        [1] => Array
            (
                [0] => Ch
                [1] => Sect
            )

        [2] => Array
            (
                [0] =>  1
                [1] =>  2
            )

        [3] => Array
            (
                [0] =>  a
                [1] => b
            )

        [4] => Array
            (
                [0] =>  and 
                [1] =>  and 
            )

    )

I'm unable to match the last portions of the string (Pg3) in my array.

The expected result should be:

    Array
    (
        [0] => Array
            (
                [0] => Ch 1 a and 
                [1] => Sect 2b and 
                [2] => Pg3
            )

        [1] => Array
            (
                [0] => Ch
                [1] => Sect
                [2] => Pg
            )

        [2] => Array
            (
                [0] =>  1
                [1] =>  2
                [2] =>  3
            )

        [3] => Array
            (
                [0] =>  a
                [1] => b
                [2] => 
            )

        [4] => Array
            (
                [0] =>  and 
                [1] =>  and 
                [2] =>  
            )

    )
3
  • 1
    Add some examples to your post. Having "input -> expected result" makes figuring out your question 100x easier. Commented Jan 13, 2013 at 7:18
  • @Supericy expected result added. Commented Jan 13, 2013 at 7:23
  • @Supericy Just wondering what would I need to change to the regex to get your same result below, if there was some extra unwanted text like $str = 'Ch 1 a blahblahdontwant and Sect 2b and Pg3'? Commented Jan 13, 2013 at 8:21

1 Answer 1

1

This regex should work /(ch|sect|pg)\s*(\d)\s*([a-z]?\b)\s*(and|or)?/i:

$str = 'Ch 1 a and Sect 2b and Pg3';
preg_match_all('/(ch|sect|pg)\s*(\d)\s*([a-z]?\b)\s*(and|or)?/i', $str, $matches);


array (size=5)
  0 => 
    array (size=3)
      0 => string 'Ch 1 a and' (length=10)
      1 => string 'Sect 2b and' (length=11)
      2 => string 'Pg3' (length=3)
  1 => 
    array (size=3)
      0 => string 'Ch' (length=2)
      1 => string 'Sect' (length=4)
      2 => string 'Pg' (length=2)
  2 => 
    array (size=3)
      0 => string '1' (length=1)
      1 => string '2' (length=1)
      2 => string '3' (length=1)
  3 => 
    array (size=3)
      0 => string 'a' (length=1)
      1 => string 'b' (length=1)
      2 => string '' (length=0)
  4 => 
    array (size=3)
      0 => string 'and' (length=3)
      1 => string 'and' (length=3)
      2 => string '' (length=0)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.