It is slightly unclear exactly what the postcondition of the algorithm is supposed to be. It seems to me that you are wanting to strip out matching pairs of ( ). The assumption here is that unmatched parentheses are left alone (otherwise you just strip out all of the ('s and )'s).
So I guess this means the input string a(bcdefghijkl(mno)p)q becomes abcdefghijklmnopq but the input string a(bcdefghijkl(mno)pq becomes a(bcdefghijklmnopq. Likewise an input string (a)) would become a).
It may be possible to do this using pcre since it does provide some non-regular features but I'm doubtful about it. The language of the input strings is not regular; it's context-free. What @ArtisticPhoenix's answer does is match complete pairs of matched parentheses. What it does not do is match all nested pairs. This nested matching is inherently non-regular in my humble understanding of language theory.
I suggest writing a parser to strip out the matching pairs of parentheses. It gets a little wordy having to account for productions that fail to match:
<?php
// Parse the punctuator sub-expression (i.e. anything within ( ... ) ).
function parse_punc(array $tokens,&$iter) {
if (!isset($tokens[$iter])) {
return;
}
$inner = parse_punc_seq($tokens,$iter);
if (!isset($tokens[$iter]) || $tokens[$iter] != ')') {
// Leave unmatched open parentheses alone.
$inner = "($inner";
}
$iter += 1;
return $inner;
}
// Parse a sequence (inside punctuators).
function parse_punc_seq(array $tokens,&$iter) {
if (!isset($tokens[$iter])) {
return;
}
$tok = $tokens[$iter];
if ($tok == ')') {
return;
}
$iter += 1;
if ($tok == '(') {
$tok = parse_punc($tokens,$iter);
}
$tok .= parse_punc_seq($tokens,$iter);
return $tok;
}
// Parse a sequence (outside punctuators).
function parse_seq(array $tokens,&$iter) {
if (!isset($tokens[$iter])) {
return;
}
$tok = $tokens[$iter++];
if ($tok == '(') {
$tok = parse_punc($tokens,$iter);
}
$tok .= parse_seq($tokens,$iter);
return $tok;
}
// Wrapper for parser.
function parse(array $tokens) {
$iter = 0;
return strval(parse_seq($tokens,$iter));
}
// Grab input from stdin and run it through the parser.
$str = trim(stream_get_contents(STDIN));
$tokens = preg_split('/([\(\)])/',$str,-1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
var_dump(parse($tokens));
I know this is a lot more code than a regex one-liner but it does solve the problem as I understand it. I'd be interested to know if anyone can solve this problem with a regular expression.
/\((?:[^\(\)]*+|(?0))*\)/this[^is meaning not, and is not the same as^[which means start of string.