In Python Regular Expressions,
re.compile("x"*50000)
gives me OverflowError: regular expression code size limit exceeded
but following one does not get any error, but it hits 100% CPU, and took 1 minute in my PC
>>> re.compile(".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000)
<_sre.SRE_Pattern object at 0x03FB0020>
Is that normal?
Should I assume, ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000 is shorter than "x"*50000?
Tested on Python 2.6, Win32
UPDATE 1:
It Looks like ".*?.*?.*?.*?.*?.*?.*?.*?.*?.*?"*50000 could be reduce to .*?
So, how about this one?
re.compile(".*?x"*50000)
It does compile, and if that one also can reduce to ".*?x", it should match to string "abcx" or "x" alone, but it does not match.
So, Am I missing something?
UPDATE 2:
My Point is not to know max limit of regex source strings, I like to know some reasons/concepts of "x"*50000 caught by overflow handler, but not on ".*?x"*50000.
It does not make sense for me, thats why.
It is something missing on overflow checking or Its just fine or its really overflowing something?
Any Hints/Opinions will be appreciated.
re.compile("x"*50000)does not get compiled, butre.compile(".*?x"*50000)got compile.x{5000}orx{500000}or whatever. When you reach the limit, there you are. What's the point of knowing where the limit is? You don't need to know until the (unlikely) even that you write a REAL regex thats too long. Not these degenerate cases that aren't sensible regexs in the first place."x"*50000got caught by overflow handler, but".*?x"*50000does not, even".*?x"*100000does not.