0

I have the following string :

bla {{bla  {{bla bla {{afsaasg}} }} blabla}} {{bla bla}} bla

I would like to match

{{bla  {{bla bla {{afsaasg}} }} blabla}}

with a regex.

but my regex

{{(.*?)}}

matches

{{bla  {{bla bla}}

anyone can help ?

Additional Info : I expect to have not more then 2 brackets at the same time.

Finally I solved this with an own Java fuction. Perhabs this will help someone :

public static ArrayList<String> getRecursivePattern(String sText, String sBegin, String sEnd) {

        ArrayList<String> alReturn = new ArrayList<String>();

        boolean ok1 = true;
        boolean ok2 = true;

        int iStartCount = 0;
        int iEndCount = 0;

        int iStartSearching = 0;

        while (ok1) {
            int iAnfang = sText.indexOf(sBegin, iStartSearching);

            ok2 = true;
            if (iAnfang > -1) {
                while (ok2) {

                    int iStartCharacter = sText.indexOf(sBegin, iStartSearching);
                    int iEndCharacter = sText.indexOf(sEnd, iStartSearching);

                    if (iEndCharacter == -1) {
                        // Nothing found . stop
                        ok2 = false;
                        ok1 = false;

                    } else if (iStartCharacter < iEndCharacter && iStartCharacter != -1) {
                        // found startpattern
                        iStartCount = iStartCount + 1;
                        iStartSearching = iStartCharacter + sBegin.length();
                    } else if (iStartCharacter > iEndCharacter && iEndCharacter != -1 || (iStartCharacter == -1 && iEndCharacter != -1)) {
                        iEndCount = iEndCount + 1;
                        iStartSearching = iEndCharacter + sEnd.length();

                    } else {
                        if (iStartCharacter < 0) {
                            // No End found . stop
                            ok2 = false;
                        }
                    }
                    if (iEndCount == iStartCount) {
                        // found the pattern
                        ok2 = false;
                        // cut
                        int iEnde = iStartSearching;// +sEnd.length();
                        String sReturn = sText.substring(iAnfang, iEnde);
                        alReturn.add(sReturn);
                    }
                }
            } else {
                ok1 = false;
            }
        }

        return alReturn;
    }

I call it:

    ArrayList<String> alTest=getRecursivePattern("This {{ is a {{Test}} bla }}","{{","}}");
    System.out.println(" sTest : " + alTest.get(0));
10
  • 1
    Regex is not powerful enough to solve this in general case (although you can construct increasingly ugly expressions to parse it to any specific level of nesting). Commented Jan 10, 2015 at 10:45
  • 1
    @Avinash : Thanks a lot. {{(?:[^{}]|(?R))*}} is exactly what is working . Can you answer and I will accept? Commented Jan 10, 2015 at 10:47
  • 1
    @AvinashRaj: But that's only supported in the P-languages (Perl, PHP, Ruby,...) that even support (by (negative) lookahead), Turing complete regexes, as far as I know the Java regex engine does not support these. And furthermore it's not really advisable to do so... Commented Jan 10, 2015 at 11:10
  • 2
    @user1344545: Recursive sub-pattern in Java? I believe that is only available in PCRE Commented Jan 10, 2015 at 11:12
  • 1
    @CommuSoft: Finally I managed to test this in JAVA and you are right. In Java this is not supported. I did my own function in Java and added this to the question. Perhabs it can help somebody else. Thank you very much, this was new and interresting to me. Commented Jan 12, 2015 at 12:29

3 Answers 3

1

.NET has special support for nested item matching, so {{(?>[^\{\}]+|\{(?<DEPTH>)|\}(?<-DEPTH>))*(?(DEPTH)(?!))}} would do what you wanted in C# to any level of nesting, but not Java.

Sign up to request clarification or add additional context in comments.

1 Comment

2nd that. And furthermore these aren't - strictly speaking - regexes, but simplified parsers...
0

Don't you need to escape the curly braces? I do in notepad++. Anyway, this should do it

\{\{[^{]+\{\{[^{}]+\}\}[^}]+\}\}

1 Comment

Thanks . Yes this works for the special case. I enhanced the question, because I probably formulated wrong. And know it works with the solution from Avinash : {{(?:[^{}]|(?R))*}}
0

You can't do this with regular expressions. It the consequence of the pumping lemma. You need to use context-free grammar's, or perhaps use dedicated tools (like XML/DOM/... parsers).

You can indeed parse this for - say - three levels deep, but you can't let this work for an arbitrary number of levels. Even then, it's better to use context-free grammars (like a LALR compiler compiler), simply because "These are the tools designed to parse such structures.".

In other words, If one day, someone can enter {{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{ bla }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}, and this is supposed to be valid, it will most likely fail.

One sidenote:

Say the level is for instance i levels deep, you can use a regex like:

  • for 1: .*?(.*?\{\{.*?\}\}.*?)*.*?
  • for 2: .*?(.*?\{\{.*?(.*?\{\{.*?\}\}.*?)*.*?\}\}.*?)*.*?
  • ...

But as you can see, the more deep you go, the longer the regex, and there is no way to parse them for arbitrary depth.

See also this discussion for people who want to parse XML/HTML - another recursive language - with regexes.

As you noted, some regular expression toolkits indeed provide tools to count things. These can be found in the P-languages (PHP, Perl,...). These aren't regular expressions (as defined by Kleene, see this Wikipedia-article about what a real regex is) strictly speaking, but simplified parsers. Because they don't describe a regular language. And - currently - not available in most regex libraries including Java. Some of the libraries even provide Turing complete parsers, parsers than can parse anything you can parse algorithmically, but it's not really recommended for advanced tasks...

8 Comments

Noone told me this thousend times :-) and the anwer from Avinash works. I guess we both can learn a lot here.
@user1344545: As argued before, it works if the maximum depth is determined in advance. The answer also shows how to construct such a regex for arbitrary depth. But it is mathematically proven you can't do this when the depth is arbitrary.
Thats wrong try {{(?:[^{}]|(?R))*}} in regex101.com with the text of bla {{bla {{bla bla {{ agagsd {{ sdgasdgasgd {{ sgsagdasd }} }} }} {{afsaasg}} }} blabla}} {{bla bla}} bla. You will see it works recursively. I am lucky.
@user1344545: an a sidenote on Avinash Raj's answer: this regex will accept all string with two {{ before two }}, thus {{{foo}} (see the error) will be accepted as well.
Thanks a lot for the explanation. I believe you. In my case I expect to have only 2 brackets and it works fine for me.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.