3

I have the following LaTeX command:

\autocites[][]{}[][]{}

where the parameters inside [] are optional the others inside {} are mandatory. The \autocites command can be extended by additional groups of arguments like:

\autocites[a1][a2]{a3}[b1][b2]{b3}
\autocites[a1][a2]{a3}[b1][b2]{b3}[c1][c2]{c3}
...

It can also be used like this:

\autocites{a}{b}
\autocites{a}[b1][]{b3}
\autocites{a}[][b2]{b3}
...

I'd like to extract its parameters by using a regular expression in PHP. This is my first attempt:

/\\autocites(\[(.*?)\])(\[(.*?)\])(\{(.*?)\})(\[(.*?)\])(\[(.*?)\])(\{(.*?)\})/

Although this works fine if \autocites contains only two groups of three parameters I'm not able to figure out how to get it working for an unknown number of parameters.

I also tried using the following expression:

/\\autocites((\[(.*?)\]\[(.*?)\])?\{(.*?)\}){2,}/

This time I'm able to match even larger numbers of parameters but then I'm not able to extract all values because PHP always just gives me the content of the last three parameters:

Array
(
    [0] => Array
        (
            [0] => \autocites[a][b]{c}[d][e]{f}[a][a]{a}
        )

    [1] => Array
        (
            [0] => [a][a]{a}
        )

    [2] => Array
        (
            [0] => [a][a]
        )

    [3] => Array
        (
            [0] => a
        )

    [4] => Array
        (
            [0] => a
        )

    [5] => Array
        (
            [0] => a
        )

)

Any help is greatly appreciated.

1
  • 2
    It's probably simpler to just match the whole command including random (\{.\}|\[.\])* variations. Then use a second preg_match_all to extract the individual params. Alternatively use ?(DEFINE) or at least the /x modifier to make a manageable regex. Commented Aug 4, 2013 at 18:57

1 Answer 1

2

You'll have to do this in two steps. Only .NET can retrieve an arbitrary amount of captures. In all other flavors, the amount of resulting captures is fixed by the number of groups in your pattern (repeating a group will only overwrite previous captures).

So first, match the entire thing to get the parameters, and then extract them in a second step:

preg_match('/\\\\autocites((?:\{[^}]*\}|\[[^]]*\])+)/', $input, $autocite);
preg_match_all('/(?|\{([^}]*)\}|\[([^]]*)\])/', $autocite[1], $parameters);
// $parameters[1] will now be an array of all parameters

Working demo.

Using a slightly more elaborate approach and the anchor \G we could also do it all in one go, by using an arbitrary amount of matches instead of captures:

preg_match_all('/
    (?|             # two alternatives whose group numbers both begin at 1
      \\\\autocites  # match the command
      (?|\{([^}]*)\}|\[([^]]*)\])
                    # and a parameter in group 1
    |               # OR
      \G            # anchor the match to the end of the last match
      (?|\{([^}]*)\}|\[([^]]*)\])
                    # and match a parameter in group 1
    )
    /x',
    $input,
    $parameters);
// again, you'll have an array of parameters in $parameters[1]

Working demo.

Note that with this approach - if you have multiple autocites in your code, you'll get all parameters from all commands in a single list. There are some ways alleviate that, but I think the first approach would be cleaner in that case.

If you want to be able to distinguish between optional and mandatory parameters (with any approach), capture the opening or closing bracket/brace along with the parameter, and check against that character to find out which type it is.

Sign up to request clarification or add additional context in comments.

1 Comment

In PHP '\\a' is \a, to get \\a you need to write '\\\\a'. Or you could use <<<'quoting'. (I think.) :-p

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.