1

I'm writing a code formatter and I need some help. I have to find the code blocks and I want to use regular expressions. The code I need to format looks basically like this:

KEYWORD name {
    word
    word
    ...
}

I am able to find the blocks that start with { and end with } with this expression:

[{](.*?)[}]

But I don't know how to add the "KEYWORD name" part to the expression. Both are custom strings that can contain any character except ;, { and }.

Another problem is that my code blocks can be nested. I don't know how to add that feature.

4
  • I don't know if your code blocks can be nested, but if so, your regex won't work. Commented Nov 19, 2010 at 9:28
  • Oh! You are right! I didn't test that yet. But they have to be nested... Commented Nov 19, 2010 at 9:34
  • As soon as nesting (to arbitrary depths) is involved, regexes become difficult to use. Strictly speaking, nested constructs aren't regular and therefore unsuitable for matching with regular expressions. Some modern regex flavors (e. g., PCRE, Perl, .NET) make recursive matching possible, however, Java is not among them. So you probably need to build/use a parser for this job. Commented Nov 19, 2010 at 10:01
  • Ok... I think you are right... I have to do the parsing on my own... Thanks! Commented Nov 19, 2010 at 10:16

2 Answers 2

3

You can just do:

KEYWORD name {.*?}

Since you want the . to match newline as well you'll have to use the multi-line mode.

Since both KEYWORD and name are arbitrary strings that can contain any character except ; , { and }:

[^;,{}]+\s+[^;,{}]+\s*{.*?}
Sign up to request clarification or add additional context in comments.

Comments

2

(.+?)\s+(.+?)\s+{(.*?)}

This is: Anything that's not a space, followed by one or more whitespace characters, followed by anything that's not a space, one or more whitespace characters, and your code block.

If the KEYWORD can only contain uppercase letters and the name, let's say all letters, digits and underscores, it should look like this:

([A-Z]+?)\s+([A-Za-z0-9_+?)\s+\{(.*?)\}

Note that if your code blocks can be nested, you'll have problems with this regex, as it will match both the first { as well as the first }.

1 Comment

Thank you, it works. But the blocks are nested sometimes. I thougt I use regex to get it done fast... :-(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.