Stack Overflow in java regex

Question

I am new in java. I am getting java Stack overflow Exception in regex strHindiText. What should I do for that?

try {
     // This regex convert the pattern "{\fldrslt {\fcs1 \ab\af24 \fcs0 &#2345;}{"
     // into "{\fldrslt {\fcs1 \ab\af24 \fcs0 &#2345;}}}{"
     // strHindiText = strHindiText.replaceAll("\\{(\\\\fldrslt[ ])\\{((\\\\\\S+[ ])+)((\\s*&#\\d+;\\s*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*)+)\\}\\{","{$1{$2$4}}}{");

     // This regex convert the pattern "{\fcs0 \af0 &#2345;{ or {\fcs0 \af0 *\tab &#2345;{" 
     // into "{\fcs0 \af0 &#2345; }{"
     strHindiText = strHindiText.replaceAll("\\{\\s*((\\\\\\S+[ ](\\*)?)+\\s*)(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*(((&#\\d+;)[ ]*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*)+)\\{", "{$1 $4$5 }{");

     // This regex convert the pattern "{&#2345; \fcs0 \af0 {" 
     // into "{&#2345; \fcs0 \af0 }{"
     strHindiText = strHindiText.replaceAll("\\{\\s*(((&#\\d+;)[ ]*(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)*[ ]*)+)[ ]*((\\\\\\S+[ ])+)\\{", "{$1 $5 }{");

     } catch(StackOverflowError er) {
            System.out.println("Third try Block StackOverflowError in regex pattern to reform the rtf tags................");
            er.printStackTrace();
        //  throw er;
     }

Whenever these strHindiText contain large data it gives an java stackoverflow exception:

java.lang.StackOverflowError
2013-08-08 15:35:07,743 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match0(Pattern.java:3754)
2013-08-08 15:35:07,743 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match(Pattern.java:3744)
2013-08-08 15:35:07,744 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
2013-08-08 15:35:07,744 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3366)
2013-08-08 15:35:07,745 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match0(Pattern.java:3782)
2013-08-08 15:35:07,745 ERROR [STDERR] (http-127.0.0.1-80-9)    at java.util.regex.Pattern$Curly.match(Pattern.java:3744)

My strHindiText data is:

 `{\rtlch\fcs1 \af1\afs18 \ltrch\fcs0 \f1\fs18\cf21\insrsid13505584 &#2349;&#2379;&#2346;&#2366;&#2354;&#32; &#2404; \par }\pard\plain \ltrpar\s16\ql \li0\ri0\sb100\sa100\sbauto1\saauto1\sl240\slmult0\widctlpar\wrapdefault\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0\pararsid13505584 \cbpat20 \rtlch\fcs1 \af0\afs24\alang1025 \ltrch\fcs0 \fs24\lang1033\langfe1033\cgrid\langnp1033\langfenp1033 {\rtlch\fcs1 \ab\af1\afs18 \ltrch\fcs0 \cs21\b\f1\fs18\cf21\insrsid13505584 &#2309;&#2344;&#2381;&#2357;&#2375;&#2359;&#2339;&#32;&#2325;&#2352;&#2375;&#2306;&#32; :}{\rtlch\fcs1 \af1\afs18 \ltrch\fcs0 \f1\fs18\cf21\insrsid13505584  \par &#2349;&#2379;&#2346;&#2366;&#2354;&#32;&#44;&#32;&#2350;&#2343;&#2381;&#2351;&#32;&#2346;&#2381;&#2352;&#2342;&#2375;&#2358;&#32;&#2325;&#2368;&#32;&#2352;&#2366;&#2332;&#2343;&#2366;&#2344;&#2368;&#32;&#2346;&#2381;&#2352;&#2366;&#2325;&#2371;&#2340;&#2367;&#2325;&#32;&#2360;&#2369;&#2306;&#2342`

Your alternative paths | are probably causing recursive calls, resulting in the stackoverflow. Regex stuff is complicated in general, and your regex is big. I'm not surprised. — keyser
– keyser, Commented Aug 8, 2013 at 10:02
I would suggest instead of alternatives (e.g a|b|c) to use the alternative notation: [abc], this should make the regex clearer, and you just need to escape the closing bracket and no other character. Also, it looks like you want to do something that regexes aren't good for - parsing - for something that isn't text but has a higher ordering. — Tassos Bassoukos
– Tassos Bassoukos, Commented Aug 8, 2013 at 10:33
You really shouldn't use RegEx for such enormous parsings.. it's not very performant, since the regex expression compiles every time you try to match a string. — Georgian
– Georgian, Commented Aug 8, 2013 at 11:13
Everything about your code is asking for problems. Try breaking the problem into multiple small problems rather than trying to do a bazillion things all at once with a giant regex. Based on the regexes you're using, I'd be surprised if you didn't experience memory problems. — jahroy
– jahroy, Commented Aug 8, 2013 at 19:32
I would personally recommend writing a parser for your RTF rather than attempting to cut it up with regex. Regex is meant for simple things, and I don't imagine RTF in Hindi is simple at all. — Shaz
– Shaz, Commented Aug 8, 2013 at 20:34

Community · Accepted Answer · 2017-05-23 12:10:18Z

3

Option 1 - Treat the symptoms

Look for recursive calls in your regex.

If you are not sure where your problem lies: try a regex tester like this.

Option 2 - Treat the cause (much better)

Don't use a regex if there are better tools for your task.

In your case you could: Search for a RTF parsing library or write your own parser.
e.g. like the one here that jahroy pointed out in the comments.

edited May 23, 2017 at 12:10

CommunityBot

11 silver badge

answered Nov 6, 2013 at 13:02

Kaadzia

1,4411 gold badge17 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

OGHaza · Accepted Answer · 2013-11-27 11:33:52Z

1

This is not a full answer but just for your information.

In your regex:

(-|,|/|\\(|\\)|\"|;|\\.|'|<|>|:|\\?)* can be written as [-,/()\";.'<>:?]*

Since this pattern occurs twice (in your first regex), this immediately shortens your regex by 40 characters and makes those sections much more readable.

answered Nov 27, 2013 at 11:33

OGHaza

4,7957 gold badges26 silver badges30 bronze badges

Comments

jh314 · Accepted Answer · 2013-08-09 02:48:31Z

0

Try this to catch the error

public class Example {
    public static void endless() {
        endless();
    }

    public static void main(String args[]) {
        try {
            endless();
        } catch(StackOverflowError t) {
            // more general: catch(Error t)
            // anything: catch(Throwable t)
            System.out.println("Caught "+t);
            t.printStackTrace();
        }
        System.out.println("After the error...");
    }
}

More importantly try increasing the size of the stack add this to your regex

+'xss='xss

adding the "+" symbol changes the operator to prevent back tracking since this doesnt seem to be necessary in your case.

edited Aug 9, 2013 at 2:48

jh314

27.9k16 gold badges66 silver badges83 bronze badges

answered Aug 9, 2013 at 2:45

Bmize729

1,1348 silver badges18 bronze badges

8 Comments

jahroy Over a year ago

He should consider using the right tool for the job rather than treating the symptoms that result from using the wrong tool...

Bmize729 Over a year ago

chances are the overflow is coming from recursive issues not greediness from the regex. By making the operator possessive we can eliminate branching and recursive handling making this expression more efficient and allows for less memory usage.

jahroy Over a year ago

I would either look for an RTF parsing library or write one myself. If I wrote one myself I would break up the parsing into small tasks rather than try to do everything at once. If I had to use regexes, I would keep them small and simple and make sure they only operate on small pieces of text. I would never consider feeding the entire document to a single, complicated regex.

jahroy Over a year ago

It took about 5 seconds of googling to find this (maybe it will help, maybe it won't...)

jahroy Over a year ago

Ok. Sorry if my comments were overly harsh. This whole "I must use regex" mentality is just so common on this site that it sometimes makes you want to scream from the top of the mountain: "not all problems must be solved with regex!"

|

Collectives™ on Stack Overflow

Stack Overflow in java regex

3 Answers 3

Option 1 - Treat the symptoms

Option 2 - Treat the cause (much better)

Comments

Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Option 1 - Treat the symptoms

Option 2 - Treat the cause (much better)

Comments

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related