I'd like to match a three-part string. The first part consists of one or more a characters, the second part consists of one or more b characters, and the third part consists either of zero or more c characters or zero or more C characters, but not a mix of c and C.
As such, I wrote the following regular expression:
/a+b+(C*|c*)/
And immediately noticed that it fails to greedily match the trailing cs in the following string:
aaaaabbcc
Wrapping the inner clauses of the or clause does not fix the unexpected behavior:
/a+b+((C*)|(c*))/
But interestingly both regular expressions match the following, where the C characters match the first clause of the or:
aaaaabbCC
The following regular expression captures the semantics accurately, but I'd like to understand why the original regular expression behaves unexpectedly.
/a+b+(([Cc])\2*)?/
cis not matched? However,C*did, that is whyc*did not even get tested.