-1

I am creating some java code that takes correctly written .java files as input, and i want to extract the text between braces using a regular expression. I want to use the Pattern and Matcher classes, and not for loops.

I believe its best to create a regex that groups the text in the whole class, and later another regex that will be aplied to the previous output and groups the text in methods.

I got close to getting the class text using the following regex on online regex testers:

\w\sclass.*\{((.*\s*)*)\}

but i'm pretty sure i am doing it wrong by using two groups instead of just one. Furthermore when i use this expression in Java i am actually getting nothing.

Here is an example file that i am using for debugging

package foo.bar;

import java.io.File;

public class Handy {
    {
    // static block, dont care!
    }

    /**
     * Check if a string is Null and Empty
     * @param str
     * @return
     */
    public static boolean isNullOrEmpty(String str) {
        Boolean result = (str == null || str.isEmpty());
        return result;
    }

    /**
     * Mimics the String.format method with a smaller name
     * @param format
     * @param args
     * @return
     */
    public static String f(String format, Object... args)
    {
        return String.format(format, args);
    }
}

With the example code above, i expect to get:

  • entire class text
{
// static block, dont care!
}

/**
 * Check if a string is Null and Empty
 * @param str
 * @return
 */
public static boolean isNullOrEmpty(String str) {
    Boolean result = (str == null || str.isEmpty());
    return result;
}

/**
 * Mimics the String.format method with a smaller name
 * @param format
 * @param args
 * @return
 */
public static String f(String format, Object... args)
{
    return String.format(format, args);
}
  • individual method text
Boolean result = (str == null || str.isEmpty());
return result;
return String.format(format, args);

I know how to use the Pattern and Matcher classes already, i just need the right regexes...

17
  • 3
    You sort of forgot to tell us exactly what you want to match here, but that doesn't really matter, because regex is not a suitable tool for parsing nested source code, which is what Java is. Commented May 6, 2019 at 14:00
  • Your match is in the first capturing group regex101.com/r/M47iI9/2 Commented May 6, 2019 at 14:01
  • 3
    Regex is not for code parsing. That should be done with lexers/parsers. Commented May 6, 2019 at 14:20
  • 1
    Nope. You shouldn't avoid. There's a quip that says "You've got a problem. You think regex is a good solution. Now you have two problems." True for your case. Commented May 6, 2019 at 14:27
  • 1
    @Aaron just "parsing" character by character could get complex too though: you'd need to distinguish between initializer blocks, classes (outer and inner), braces in comments such as JavaDoc's {@code}, braces in string literals (not that uncommon), methods, blocks inside methods etc. Commented May 6, 2019 at 14:45

1 Answer 1

0

After some confusion in the comments section, i would like to share my solution for what i asked, even if it was not very clear.

This is not thoroughly tested code, but it works for my purpose. Some adjustments or improvements are very likely possible. I took some inspiration from the comments i read in this post, and others like this.

I feed each of the following methods the entire plain text found in a .java file, and from there i use Pattern and Matcher to extract what i want.

private static String patternMatcher(String content, String patternText, int groupIndex) {
    Pattern pattern = Pattern.compile(patternText);
    Matcher matcher = pattern.matcher(content);

    if (matcher.find()) {
        return matcher.group(groupIndex);
    } else {
        return "";
    }
}

public static String getPackageName(String content) {
    return patternMatcher(content, ".*package\\s+(.*)\\s*\\;", 1);
}

public static String getClassName(String content) {
    return patternMatcher(content, ".*class\\s+(\\w+)[\\w\\s]+\\{", 1);
}

public static String getClassCode(String content) {
    return patternMatcher(content, ".*class.*\\{((.*\\s*)*)\\}", 1);
}

public static String getMethodName(String code) {
    String uncommentedCode = removeComments(code).trim();

    return patternMatcher(uncommentedCode,
            "(public|private|static|protected|abstract|native|synchronized) *([\\w<>.?, \\[\\]]*)\\s+(\\w+)\\s*\\([\\w<>\\[\\]._?, \\n]*\\)\\s*([\\w ,\\n]*)\\s*\\{",
            3);
}

public static String removeComments(String content) {
    return content.replaceAll("\\/\\*[\\s\\S]*?\\*\\/|([^:]|^)\\/\\/.*$", "$1 ").trim();
}

I double checked but i hope i didn't forget any escape character, be carefull with those.

Lots of people recomended that i used an actual code parsing library, like ANTLR, but i assumed it would take much longer for me to learn how to work with it, then it would take to do with with RegEx. Furthermore i wanted to improve my Regex skills, this exercise definitely taught me some things.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.