1

I'm trying to create a regular expression that detect a new class for example:

public interface IGame {

or

private class Game {

This is what I have so far but it isn't detecting :

(line.matches("(public|protected|private|static|\\s)"+"(class|interface|\\s)"+"(\\w+)"))

Can anyone give me some pointers please?

4
  • 4
    What about final or abstract? What about generics? Commented Oct 22, 2014 at 12:31
  • are you trying to match both type of string formats? Commented Oct 22, 2014 at 12:34
  • As I understood from your other question stackoverflow.com/questions/26504009/… you want to reverse-engineer and analyze/visualize structure of Java classes. Is the code you want to analyze syntactically correct so that it can be compiled into the Java byte code class files (reverse-engineering class dependencies from class files can be simpler task)? Or you need to work with some hot code snippets? Commented Oct 23, 2014 at 16:14
  • 1
    I'm trying to hand my program a collection of source code files and my program will create a class diagram from this. I've decided to use Jparser to read in the files though instead of using a regex. Commented Oct 24, 2014 at 10:31

2 Answers 2

6

This regex is an incomplete specification of Java class and interface declaration. However, it can match declarations like this:

abstract class X<B extends Integer,D extends java.io.InputStream,R extends Comparator<? super D>>extends java.util.ArrayList<Integer>implements java.util.Queue<Integer>,Serializable{}

public@Deprecated interface K<D> extends Comparable<Integer>{}

So it should be sufficient for most purpose.

Please read the code to generate the regex at the end to know exactly what was skipped. In summary, ReferenceType, Annotation are not fully incorporated into this regex since they would require recursive regex to parse properly.

The Java regex to match Java class declaration in all its g(l)ory details:

(?:((?:public|protected|private|abstract|static|final|strictfp|@[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)(?:[ \t\f\r\n]*+(?:(?<!\p{javaJavaIdentifierPart})|(?!\p{javaJavaIdentifierPart}))(?:public|protected|private|abstract|static|final|strictfp|@[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))*+)[ \t\f\r\n]*+)?+(?:(?<!\p{javaJavaIdentifierPart})|(?!\p{javaJavaIdentifierPart}))class[ \t\f\r\n]++(\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)[ \t\f\r\n]*+(?:(<[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]++extends[ \t\f\r\n]++(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]++extends[ \t\f\r\n]++(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+)*+[ \t\f\r\n]*+>)[ \t\f\r\n]*+)?+(?:(?<!\p{javaJavaIdentifierPart})|(?!\p{javaJavaIdentifierPart}))(?:(extends[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+)[ \t\f\r\n]*+)?+(?:(?<!\p{javaJavaIdentifierPart})|(?!\p{javaJavaIdentifierPart}))(?:(implements[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+)*+[ \t\f\r\n]*+))?+[{]

And for Java interface declaration:

(?:((?:public|protected|private|abstract|static|strictfp|@[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)(?:[ \t\f\r\n]*+(?:(?<!\p{javaJavaIdentifierPart})|(?!\p{javaJavaIdentifierPart}))(?:public|protected|private|abstract|static|strictfp|@[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))*+)[ \t\f\r\n]*+)?+(?:(?<!\p{javaJavaIdentifierPart})|(?!\p{javaJavaIdentifierPart}))interface[ \t\f\r\n]++(\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)[ \t\f\r\n]*+(?:(<[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]++extends[ \t\f\r\n]++(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]++extends[ \t\f\r\n]++(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+)*+[ \t\f\r\n]*+>)[ \t\f\r\n]*+)?+(?:(?<!\p{javaJavaIdentifierPart})|(?!\p{javaJavaIdentifierPart}))(?:(extends[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+)*+)[ \t\f\r\n]*+)?+[{]

It is generated by building up the pattern piece by piece. I reference Java Language Specification for Java SE 7 for the grammar of NormalClassDeclaration and NormalInterfaceDeclaration.

Here is the code to generate the monster above, including comments on which pattern is skipped:

// All rules never end with WhiteSpace. This allows us to check for error easier.

final static String Identifier = "\\p{javaJavaIdentifierStart}\\p{javaJavaIdentifierPart}*+";
// Heuristic to exclude any position where adding space would split an Identifier or a keyword apart
final static String TokenBoundary = "(?:(?<!\\p{javaJavaIdentifierPart})|(?!\\p{javaJavaIdentifierPart}))";

// Optional whitespace
final static String WhiteSpaceOpt = "[ \\t\\f\\r\\n]*+";
// Compulsory whitespace
final static String WhiteSpaceCom = "[ \\t\\f\\r\\n]++";

final static String MarkerAnnotation = "@" + WhiteSpaceOpt + Identifier;
// Skipped: SingleElementAnnotation, NormalAnnotation
// Can't be included due to middle recursion
final static String Annotation = MarkerAnnotation;

final static String ClassModifier = "(?:public|protected|private|abstract|static|final|strictfp|" + Annotation + ")";
// Since the declaration
//     public@Deprecated ...
// is allowed, the space between modifiers is optional.
// Since allowing the space to be optional recognizes this invalid declaration
//     publicstatic ...
// we need to assert boundary between 2 class modifiers
final static String ClassModifiers = ClassModifier + "(?:" + WhiteSpaceOpt + TokenBoundary + ClassModifier + ")*+";

final static String TypeVariable = Identifier;

// Skipped: ArrayType, ClassOrInterfaceType
// Can't be included, since the grammar is context-free, and this is where middle recursion occurs
final static String ReferenceType = TypeVariable;
final static String WildcardBounds = "(?:extends|super)" + WhiteSpaceCom + ReferenceType;
final static String Wildcard = "[?]" + WhiteSpaceOpt + "(?:" + WildcardBounds + ")?+";
final static String TypeArgument = "(?:" + ReferenceType + "|" + Wildcard + ")";
final static String TypeArgumentList = TypeArgument + "(?:" + WhiteSpaceOpt + "," + WhiteSpaceOpt + TypeArgument + ")*+";
final static String TypeArguments = "<" + WhiteSpaceOpt + TypeArgumentList + WhiteSpaceOpt + ">";

final static String TypeName = Identifier + "(?:" + WhiteSpaceOpt + "[.]" + WhiteSpaceOpt + Identifier + ")*+";
// Expanded definition of ClassOrInterfaceType = ClassType = InterfaceType
//     ClassOrInterfaceType -> TypeName  TypeArguments<opt>
//     ClassOrInterfaceType -> ClassOrInterfaceType . Identifier TypeArguments<opt>
final static String ClassType = TypeName + "(?:" + WhiteSpaceOpt + TypeArguments + ")?+" + 
    "(?:" + WhiteSpaceOpt + "[.]" + WhiteSpaceOpt + Identifier + "(?:" + WhiteSpaceOpt + TypeArguments + ")?+" + ")*+";
// Definition of ClassType and InterfaceType are identical
final static String InterfaceType = ClassType;
final static String ClassOrInterfaceType = ClassType;

final static String TypeBound = "extends" + WhiteSpaceCom + "(?:" + ClassOrInterfaceType + "|" + TypeVariable + ")";
final static String TypeParameter = TypeVariable + "(?:" + WhiteSpaceCom + TypeBound + ")?+";
final static String TypeParameterList = TypeParameter + "(?:" + WhiteSpaceOpt + "," + WhiteSpaceOpt + TypeParameter + ")*+";
final static String TypeParameters = "<" + WhiteSpaceOpt + TypeParameterList + WhiteSpaceOpt + ">";

final static String Super = "extends" + WhiteSpaceCom + ClassType;
    
final static String InterfaceTypeList = InterfaceType + "(?:" + WhiteSpaceOpt  + "," + WhiteSpaceOpt + InterfaceType + ")*+";
final static String Interfaces = "implements" + WhiteSpaceCom + InterfaceTypeList;

final static String NormalClassDeclaration =
    // Annotation in its fullest form can end in ), so WhiteSpaceOpt is used here for pedantic
    // It can be changed to WhiteSpaceCom to save the TokenBoundary check, since current definition always end with Identifier character
    "(?:" + "(" + ClassModifiers + ")" + WhiteSpaceOpt + ")?+" +
    TokenBoundary + "class" + WhiteSpaceCom +
    // WhiteSpaceOpt is used here, since TypeParameters starts with < and no whitespace is needed to delimit
    "(" + Identifier + ")" + WhiteSpaceOpt + 
    "(?:" + "(" + TypeParameters + ")" + WhiteSpaceOpt + ")?+" +
    // As the result, we need to check for boundary before "extends" and "implements"
    TokenBoundary + "(?:" + "(" + Super + ")"+ WhiteSpaceOpt + ")?+" +
    TokenBoundary + "(?:" + "(" + Interfaces + WhiteSpaceOpt + ")" + ")?+[{]";
    // ClassBody is skipped, and only opening bracket { is matched

final static String InterfaceModifier = "(?:public|protected|private|abstract|static|strictfp|" + Annotation + ")";
// Same as ClassModifiers, except that "final" is no longer a valid modifier
final static String InterfaceModifiers = InterfaceModifier + "(?:" + WhiteSpaceOpt + TokenBoundary + InterfaceModifier + ")*+";

final static String ExtendsInterfaces = "extends" + WhiteSpaceCom + InterfaceTypeList;

final static String NormalInterfaceDeclaration = 
    "(?:" + "(" + InterfaceModifiers + ")" + WhiteSpaceOpt + ")?+" +
    TokenBoundary + "interface" + WhiteSpaceCom +
    // WhiteSpaceOpt is used here, since TypeParameters starts with < and no whitespace is needed to delimit
    "(" + Identifier + ")" + WhiteSpaceOpt + 
    "(?:" + "(" + TypeParameters + ")" + WhiteSpaceOpt + ")?+" +
    // As the result, we need to check for boundary before "extends" here
    TokenBoundary + "(?:" + "(" + ExtendsInterfaces + ")" + WhiteSpaceOpt + ")?+[{]";

I am well aware CamelCase and also the fact that some constants differ by pluralization are inappropriate, but it provides a better mapping to the grammar defined in the JLS.

Here is the demo on ideone.

Sign up to request clarification or add additional context in comments.

4 Comments

How is this regex supposed to be used for detection that OP asked about, e.g. how does the regex detect "SetupMouseListener extends MouseAdapter"? It can determine if given sentence is grammatically valid or not valid but how does it detect the meaning? (BTW: yes, it is interesting to see how you can derive regex from grammar but it is not enough to make the answer actually useful)
@xmojmr: It have captured the extends clause and class name. Just throw in the source code for crunching.
If you "just throw in some source code for crunching" into your answer showing how it spits out "SetupMouseListener", "MouseAdapter" and "extends" when it sees "class SetupMouseListener extends MouseAdapter {}" then I'll give you my +1
@xmojmr: Done. My code will return extends MouseAdapter, but you can always capture only MouseAdapter by making modification to Super
2

Change your regex like below to match both type of string formats.

line.matches("(?:public|protected|private|static)\\s+(?:class|interface)\\s+\\w+\\s*\\{");

Example:

String s1 = "public interface IGame {";
String s2 = "private class Game {";
System.out.println(s1.matches("(?:public|protected|private|static)\\s+(?:class|interface)\\s+\\w+\\s*\\{"));
System.out.println(s2.matches("(?:public|protected|private|static)\\s+(?:class|interface)\\s+\\w+\\s*\\{"));

Output:

true
true

3 Comments

This is almost perfect but it won't see cases such as private class SetupMouseListener extends MouseAdapter { sorry I should have specified in the original example. I will try adding it in myself
but you didn't put the above in your regex. I just modified your regex that's all. I didn't know about the following terms like listener, Adapter, etc. You could modify the above regex according to your needs.
That’s far away from detecting all class declarations, but the question leaves room for interpretation whether really all declarations ought to be matched. You know, there’s enum and @interface as well. Further, a class declaration doesn’t need to have any of public, protected, private, or static, e.g. class Foo is a valid class declaration, but a class also may have more than one modifier like in private static final class Bar, and we’re not talking about private @Deprecated class ClassName yet…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.