2

I have this in a function which is supposed to replace any sequence of parentheses with what is enclosed in it like (abc) becomes abc any where it appears even recursively because parens can be nested.

$return =   preg_replace_callback(
    '|(\((.+)\))+|',
    function ($matches) {
        return $matches[2];
    },
    $s
);

when the above regex is fed this string "a(bcdefghijkl(mno)p)q" as input it returns "ap)onm(lkjihgfedcbq". This shows the regex is matched once. What can I do to make it continue to match even inside already made matches and produce this `abcdefghijklmnopq'"

11
  • /\((?:[^\(\)]*+|(?0))*\)/ this Commented Jul 5, 2017 at 15:27
  • @ArtisticPhoenix the parens can appear any where in the subject string not just at the beginning of the sting Commented Jul 5, 2017 at 15:29
  • And? this is not dependent on that. Commented Jul 5, 2017 at 15:30
  • see regex101.com/r/NsQSla/2 Commented Jul 5, 2017 at 15:31
  • the use of [^ is meaning not, and is not the same as ^[ which means start of string. Commented Jul 5, 2017 at 15:33

3 Answers 3

2

To match balanced parenthetical substrings you may use a well-known \((?:[^()]++|(?R))*\) pattern (described in Matching Balanced Constructs), inside a preg_replace_callback method, where the match value can be further manipulated (just remove all ( and ) symbols from the match that is easy to do even without a regex:

$re = '/\((?:[^()]++|(?R))*\)/';
$str = 'a(bcdefghijkl(mno)p)q((('; // Added three ( at the end
$result = preg_replace_callback($re, function($m) {
    return str_replace(array('(',')'), '', $m[0]);
}, $str);
echo $result; // => abcdefghijklmnopq(((

See the PHP demo

To get overlapping matches, you need to use a known technique, capturing inside a positive lookahead, but you won't be able to perform two operations at once (replacing and matching), you can run matching first, and then replace:

$re = '/(?=(\((?:[^()]++|(?1))*\)))/';
$str = 'a(bcdefghijkl(mno)p)q(((';
preg_match_all($re, $str, $m);
print_r($m[1]);
// => Array ( [0] => (bcdefghijkl(mno)p)  [1] => (mno) )

See the PHP demo.

Sign up to request clarification or add additional context in comments.

6 Comments

your pattern is close. But doesn't match inside matches like fed with a(bcdefghijkl(mno)p)q it outputs bcdefghijkl(mno)p with a copy paste of your code showing that (mno) is never matched.
Ok my bad. I run wrong pattern.One more thing.How to get matches resulting from recursion.
@CholthiPaulTtiopic: No idea what you mean by matches resulting from recursion. The only match in a(bcdefghijkl(mno)p)q((( is (bcdefghijkl(mno)p), and it is in $m[0]. Just add it to an array that you may pass with use (&$arr) to the anonymous method. See ideone.com/UdxbWy.
What I really originally wanted to do was to match anything in a balanced parentheses like a(bcdefghijkl(mno)p)q((( should give me (bcdefghijkl(mno)p) and (mno) as matches https://regex101.com/r/NsQSla/3.
|
1

Try this one,

  preg_match('/\((?:[^\(\)]*+|(?0))*\)/', $str )

https://regex101.com/r/NsQSla/1

It will match everything inside of the ( ) as long as they are matched pairs.

Example

(abc) (abc (abc))

will have the following matches

Match 1
   Full match   0-5 `(abc)`
Match 2
   Full match   6-17    `(abc (abc))`

5 Comments

please tell me how this does not answer the question it appears even recursively because parens can be nested.
Please clarify how it can answer this part ...to replace any sequence of parentheses with what is enclosed in it like (abc) becomes abc
it solves the part they were stuck on.... the relacement can be as simple as str_replace(['(',')'], '', $match[1]) after it's verified they are a matched pair of parenthesis. It really depends on their use case, what is the best way to handle the match.
Then you need to add this extra step to your solution.
Can you make it match in already made matches so that Match3 = (abc)
1

It is slightly unclear exactly what the postcondition of the algorithm is supposed to be. It seems to me that you are wanting to strip out matching pairs of ( ). The assumption here is that unmatched parentheses are left alone (otherwise you just strip out all of the ('s and )'s).

So I guess this means the input string a(bcdefghijkl(mno)p)q becomes abcdefghijklmnopq but the input string a(bcdefghijkl(mno)pq becomes a(bcdefghijklmnopq. Likewise an input string (a)) would become a).

It may be possible to do this using pcre since it does provide some non-regular features but I'm doubtful about it. The language of the input strings is not regular; it's context-free. What @ArtisticPhoenix's answer does is match complete pairs of matched parentheses. What it does not do is match all nested pairs. This nested matching is inherently non-regular in my humble understanding of language theory.

I suggest writing a parser to strip out the matching pairs of parentheses. It gets a little wordy having to account for productions that fail to match:

<?php

// Parse the punctuator sub-expression (i.e. anything within ( ... ) ).
function parse_punc(array $tokens,&$iter) {
    if (!isset($tokens[$iter])) {
        return;
    }

    $inner = parse_punc_seq($tokens,$iter);
    if (!isset($tokens[$iter]) || $tokens[$iter] != ')') {
        // Leave unmatched open parentheses alone.
        $inner = "($inner";
    }

    $iter += 1;
    return $inner;
}

// Parse a sequence (inside punctuators).
function parse_punc_seq(array $tokens,&$iter) {
    if (!isset($tokens[$iter])) {
        return;
    }

    $tok = $tokens[$iter];
    if ($tok == ')') {
        return;
    }
    $iter += 1;

    if ($tok == '(') {
        $tok = parse_punc($tokens,$iter);
    }

    $tok .= parse_punc_seq($tokens,$iter);
    return $tok;
}

// Parse a sequence (outside punctuators).
function parse_seq(array $tokens,&$iter) {
    if (!isset($tokens[$iter])) {
        return;
    }

    $tok = $tokens[$iter++];
    if ($tok == '(') {
        $tok = parse_punc($tokens,$iter);
    }

    $tok .= parse_seq($tokens,$iter);
    return $tok;
}

// Wrapper for parser.
function parse(array $tokens) {
    $iter = 0;
    return strval(parse_seq($tokens,$iter));
}

// Grab input from stdin and run it through the parser.
$str = trim(stream_get_contents(STDIN));
$tokens = preg_split('/([\(\)])/',$str,-1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
var_dump(parse($tokens));

I know this is a lot more code than a regex one-liner but it does solve the problem as I understand it. I'd be interested to know if anyone can solve this problem with a regular expression.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.