0

Is there an AST tool that allows easily extract metadata from a Java method?

For instance, using the following code snippet

/*
 Checks if a target integer is present in the list of integers.
*/
public Boolean contains(Integer target, List<Integer> numbers) {
    for(Integer number: numbers){
        if(number.equals(target)){
            return true;
        }
    }
    return false;
}

the metadata would be:

metadata = {
    "comment": "Checks if a target integer is present in the list of integers.",
    "identifier": "contains",
    "parameters": "Integer target, List<Integer> numbers",
    "return_statement": "Boolean false"

}
1
  • That's funny because that is exactly what I recently wrote in Java Parser I'll post an answer shortly. Commented Oct 31, 2020 at 21:52

2 Answers 2

2

This class was written a long time ago.. It was actually about four different classes - spread out in a package called JavaParserBridge. It tremendously simplifies what you are trying to do. I have stripped out all the unneccessary stuff, and boiled it down to 100 lines. It took about an hour...

I hope this all makes sense. I usually add a lot of comments to code, but sometimes when dealing with other libraries - and posting on Stack Overflow - since this is literally just one big constructor - I will leave you with the documentation page for Java Parser

To use this class, just pass the source-code file for a Java Class as a single java.lang.String, and the method named getMethods(String) will return a Java Vector<Method>. Each element of the returned Vector will have an instance of Method which shall have all of the Meta Information that you requested in your question.

IMPORTANT: You can get the JAR File for this package off of the github page. You need the JAR named: javaparser-core-3.16.2.jar

import com.github.javaparser.StaticJavaParser;
import com.github.javaparser.ast.CompilationUnit;
import com.github.javaparser.ast.body.TypeDeclaration;
import com.github.javaparser.ast.body.MethodDeclaration;
import com.github.javaparser.ast.body.Parameter;
import com.github.javaparser.ast.type.ReferenceType;
import com.github.javaparser.ast.type.TypeParameter;
import com.github.javaparser.ast.Node;
import com.github.javaparser.ast.NodeList;
import com.github.javaparser.ast.Modifier; // Modifiers are the key-words such as "public, private, static, etc..."
import com.github.javaparser.printer.lexicalpreservation.LexicalPreservingPrinter;
import com.github.javaparser.printer.lexicalpreservation.PhantomNodeLogic;

import java.io.IOException;
import java.util.Vector;


public class Method
{
    public final String name, signature, jdComment, body, returnType;
    public final String[] modifiers, parameterNames, parameterTypes, exceptions;

    private Method (MethodDeclaration md)
    {

        NodeList<Parameter>     paramList       = md.getParameters();
        NodeList<ReferenceType> exceptionList   = md.getThrownExceptions();
        NodeList<Modifier>      modifiersList   = md.getModifiers();

        this.name           = md.getNameAsString();
        this.signature      = md.getDeclarationAsString();
        this.jdComment      = (md.hasJavaDocComment() ? md.getJavadocComment().get().toString() : null);
        this.returnType     = md.getType().toString();
        this.modifiers      = new String[modifiersList.size()];
        this.parameterNames = new String[paramList.size()];
        this.parameterTypes = new String[paramList.size()];
        this.exceptions     = new String[exceptionList.size()];
        this.body           = (md.getBody().isPresent()
                                ?   LexicalPreservingPrinter.print
                                        (LexicalPreservingPrinter.setup(md.getBody().get()))
                                :   null);

        int i=0;
        for (Modifier modifier : modifiersList) modifiers[i++] = modifier.toString();

        i=0;
        for (Parameter p : paramList)
        {
            parameterNames[i]           = p.getName().toString();
            parameterTypes[i]           = p.getType().toString();
            i++;
        }

        i=0;
        for (ReferenceType r : exceptionList) this.exceptions[i++] = r.toString();
    }

    public static Vector<Method> getMethods(String sourceFileAsString) throws IOException
    {
        // This is the "Return Value" for this method (a Vector)
        final Vector<Method> methods = new Vector<>();

        // This asks Java Parser to parse the source code file
        // The String-parameter 'sourceFileAsString' should have this

        CompilationUnit cu = StaticJavaParser.parse(sourceFileAsString);

        // This will "walk" all of the methods that were parsed by
        // StaticJavaParser, and retrieve the method information.
        // The method information is stored in a class simply called "Method"

        cu.walk(MethodDeclaration.class, (MethodDeclaration md) -> methods.add(new Method(md)));

        // There is one important thing to do: clear the cache
        // Memory leaks shall occur if you do not.

        PhantomNodeLogic.cleanUpCache(); 

        // return the Vector<Method>
        return methods;
    }
}
Sign up to request clarification or add additional context in comments.

4 Comments

Impressive answer! Thank you very much. Is there a way around the need for the code to be compiled? This is necessary because I will extract this metadata in a dataset (leclair.tech/data/funcom) formed by a set of standalone methods (which is possibly not compilable).
Well, it doesn't actually need to be a class, the Java Parser type CompilationUnit can be any code snippet at all. The only requirement that the JavaParser package requires is that the code must be Syntactically Correct.
When I pass only the code snippet presented in the question I got: Exception in thread "main" com.github.javaparser.ParseProblemException: (line 4,col 1) Parse error. Found "Boolean" <IDENTIFIER>, expected one of ";" "@" "class" "enum" "interface" "module" "open".
I just finished the answer for you... I think it works for all of the function.json - Except the ones that are constructors I'm going to leave that as an exercise for you to finish.
1

You need to add this method to the class above... I rarely (if ever) add multiple answers to a single Stack Overflow question. But rather than making this overly complicated, since this turned into a lot of code, I'm posting this main method as a separate answer to your question.

You need to include this method in the above class, and it will properly process your file functions.json which I downloaded from your website. The file that is being processed is the one named functions.json and it is the one that contains lists of methods and their data-base ID's.

ALSO: Make sure to add the line: import java.util.regex.* because this method uses java class Pattern and class Matcher


    public static void main(String[] argv) throws IOException
    {
        // "321": "\tpublic int getPushesLowerbound() {\n\t\treturn pushesLowerbound;\n\t}\n",
        // If you have not used "Regular Expressions" before, you are just
        // going to have to read about them.  This "Regular Expression" parses your
        // JSON "functions.json" file.  It is a little complicated, but not too bad.

        Pattern         P1          = Pattern.compile("^\\s+\"(\\d+)\"\\:\\s+\"(.*?)\\\\n\",$");
        BufferedReader  br          = new BufferedReader(new FileReader(new File("functions.json")));
        String          s           = br.readLine();

        // Any time you have a "Constructor" instead of a method, you should
        // use some other method in `StaticJavaParser` to deal with it.
        // for now, I am just going to keep a "Fail List" instead..

        int             failCount   = 0;
        Vector<String>  failIDs     = new Vector<>();
 
        while (! (s = br.readLine()).equals("}"))
        {
            // Parse the JSON using a Regular Expression.  It is just easier to do it this way
            // You have a VERY BASIC json file.

            Matcher m = P1.matcher(s);
            
            // I do not think any of the String's will fail the regular expression matcher.
            // Just in case, continue if the Regular Expression Match Failed.
            if (! m.find()) { System.out.print("."); continue; }
            
            // The ID is the first JSON element matched by the regular expression
            String id = m.group(1);
            
            // The source code is the second JSON element matched by the regular-expression
            // NOTE: Your source-code is not perfect... It has "escape sequences", so these sequennces
            //       have to be "unescaped"
            // ALSO: this is not the most efficient way to "un-escape" an escape-sequence, but I would
            //       have to include an external library to do it the right way, so I'm going to leave
            //       this version here for your to think about.
            String src = m.group(2)
                .replace("\\\\", "" + ((char) 55555))
                .replace("\\n", "\n")
                .replace("\\t", "\t")
                .replace("\\\"", "\"")
                .replace("" + ((char) 55555), "\\");

            // Java Parser has a method EXPLICITLY FOR parsing method Declarations.
            // Your "functions.json" file has a list of method-declarations.
            MethodDeclaration   md          = null;

            // I found one that failed - it was a constructor..
            try
                { md = StaticJavaParser.parseMethodDeclaration(src); }
            catch (Exception e)
                { System.out.println(src); e.printStackTrace(); failCount++; continue; }

            Method method = new Method(md);

            System.out.print(
                "ID:           " + id + '\n' +
                "Name:         " + method.name + '\n' +
                "Return Type:  " + method.returnType + '\n' +
                "Parameters:   "
            );

            for (int i=0; i < method.parameterNames.length; i++)
                System.out.print(method.parameterNames[i] + '(' + method.parameterTypes[i] + ")  ");

            System.out.println("\n");

            PhantomNodeLogic.cleanUpCache();
        }
        
        System.out.print(
            "Fail Count: " + failCount + "\n" +
            "Failed ID's: "
        );
        for (String failID : failIDs) System.out.print(failID + " ");
        System.out.println();
    }

The above method will produce this type of output. Since you have - literally - one million methods, it will run for a while.

NOTE: Not every method in that list was a valid method. if there is a constructor, instead of a method, you would need to parse it as a constructor, instead. There is a "Fail List" for methods that couldn't be parsed by JavaParser - and I'm going to leave this as an excercise for you to figure out how to deal with Constructors (which aren't parsed by the StaticJavaParser method named parseMethodDeclaration

NOTE: This will run for a long time - I have only posted a (very) small subset of the output from this main(String[] argv) method...


ID:           32808641
Name:         addUnboundTypePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808649
Name:         addNamePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808650
Name:         addInputParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808651
Name:         addQualifiedNamePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808652
Name:         addOutputParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808656
Name:         addReturnParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808658
Name:         addSignatureParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808659
Name:         getLabelProvider
Return Type:  IItemLabelProvider
Parameters:   namedElement(NamedElement)

ID:           32808661
Name:         getLabel
Return Type:  String
Parameters:   namedElement(NamedElement)

ID:           32808677
Name:         addBodyPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808678
Name:         addLanguagePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808696
Name:         addKindPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808707
Name:         addStaticPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808708
Name:         addKindPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808709
Name:         addSemanticsPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808711
Name:         addConstrainedElementPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808713
Name:         addDefinedFeaturePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808727
Name:         addNestingNamespacePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808741
Name:         addKindPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808749
Name:         addSuperTypePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32814359
Name:         getResource
Return Type:  ResourceBundle
Parameters:   name(String)  language(String)

ID:           32814360
Name:         store
Return Type:  void
Parameters:   resource(ResourceBundle)  name(String)  language(String)

ID:           32814364
Name:         getString
Return Type:  String
Parameters:   key(String)  resourceName(String)  language(String)

ID:           32814400
Name:         getGlobalCompletionRate
Return Type:  double
Parameters:

ID:           32814409
Name:         setCurrentSubTask
Return Type:  void
Parameters:   subTask(TaskMonitor)  subTaskShare(double)

ID:           32814429
Name:         enforceCompletion
Return Type:  void
Parameters:

ID:           32814431
Name:         getCurrentActiveSubTask
Return Type:  TaskMonitor
Parameters:

ID:           32814469
Name:         checkTaskState
Return Type:  void
Parameters:

ID:           32814619
Name:         getReportAsText
Return Type:  String
Parameters:   report(ProcessReport)

ID:           32815305
Name:         showRecoveryResultWindow
Return Type:  void
Parameters:   context(ProcessContext)

ID:           32815353
Name:         validateStructure
Return Type:  void
Parameters:

ID:           32815413
Name:         buildArchive
Return Type:  void
Parameters:   context(ProcessContext)

ID:           32815445
Name:         checkArchiveCompatibility
Return Type:  boolean
Parameters:   archive(File)

ID:           32815446
Name:         checkStupidConfigurations
Return Type:  boolean
Parameters:

ID:           32815472
Name:         getDescription
Return Type:  String
Parameters:

ID:           32815501
Name:         getDataDirectory
Return Type:  File
Parameters:   archive(File)

IMPORTANT: (again) Any time any of your Data-base functions are constructors rather than methods the JavaParser method that I have used in class StaticJavaParser will throw an Exception.

See Here: This is a constructor:


ID:           32812832
Name:         run
Return Type:  void
Parameters:

        public PeriodicData (String secProp ) {
                this.interval = 300;
                try {
                        this.interval = Integer.parseInt( secProp );
                } catch (Exception e ) {} // use default 5m

        }

And the code i have posted prints this message when it encounters it:


com.github.javaparser.ParseProblemException: Encountered unexpected token: "(" "("
    at line 1, column 22.

Was expecting one of:

    "enum"
    "exports"
    "module"
    "open"
    "opens"
    "provides"
    "requires"
    "strictfp"
    "to"
    "transitive"
    "uses"
    "with"
    "yield"
    <IDENTIFIER>

Problem stacktrace :
  com.github.javaparser.GeneratedJavaParser.generateParseException(GeneratedJavaParser.java:10906)
  com.github.javaparser.GeneratedJavaParser.jj_consume_token(GeneratedJavaParser.java:10752)
  com.github.javaparser.GeneratedJavaParser.Identifier(GeneratedJavaParser.java:2193)
  com.github.javaparser.GeneratedJavaParser.SimpleName(GeneratedJavaParser.java:2127)
  com.github.javaparser.GeneratedJavaParser.MethodDeclaration(GeneratedJavaParser.java:1224)
  com.github.javaparser.GeneratedJavaParser.MethodDeclarationParseStart(GeneratedJavaParser.java:6020)
  com.github.javaparser.JavaParser.parse(JavaParser.java:123)
  com.github.javaparser.JavaParser.parseMethodDeclaration(JavaParser.java:540)
  com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
  Method.main(Method.java:110)

        at com.github.javaparser.StaticJavaParser.handleResult(StaticJavaParser.java:260)
        at com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
        at Method.main(Method.java:110)
        public PeriodicData (int seconds ) {
                this.interval = seconds;
        }
com.github.javaparser.ParseProblemException: Encountered unexpected token: "(" "("
    at line 1, column 22.

Was expecting one of:

    "enum"
    "exports"
    "module"
    "open"
    "opens"
    "provides"
    "requires"
    "strictfp"
    "to"
    "transitive"
    "uses"
    "with"
    "yield"
    <IDENTIFIER>

Problem stacktrace :
  com.github.javaparser.GeneratedJavaParser.generateParseException(GeneratedJavaParser.java:10906)
  com.github.javaparser.GeneratedJavaParser.jj_consume_token(GeneratedJavaParser.java:10752)
  com.github.javaparser.GeneratedJavaParser.Identifier(GeneratedJavaParser.java:2193)
  com.github.javaparser.GeneratedJavaParser.SimpleName(GeneratedJavaParser.java:2127)
  com.github.javaparser.GeneratedJavaParser.MethodDeclaration(GeneratedJavaParser.java:1224)
  com.github.javaparser.GeneratedJavaParser.MethodDeclarationParseStart(GeneratedJavaParser.java:6020)
  com.github.javaparser.JavaParser.parse(JavaParser.java:123)
  com.github.javaparser.JavaParser.parseMethodDeclaration(JavaParser.java:540)
  com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
  Method.main(Method.java:110)

        at com.github.javaparser.StaticJavaParser.handleResult(StaticJavaParser.java:260)
        at com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
        at Method.main(Method.java:110)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.