20

I need a Regex that will match a java method declaration. I have come up with one that will match a method declaration, but it requires the opening bracket of the method to be on the same line as the declaration. If you have any suggestions to improve my regex or simply have a better one then please submit an answer.

Here is my regex: "\w+ +\w+ *\(.*\) *\{"

For those who do not know what a java method looks like I'll provide a basic one:

int foo()
{

}

There are several optional parts to java methods that may be added as well but those are the only parts that a method is guaranteed to have.

Update: My current Regex is "\w+ +\w+ *\([^\)]*\) *\{" so as to prevent the situation that Mike and adkom described.

1
  • I don't know what a Java method declaration looks like mind providing an example? Commented Sep 16, 2008 at 1:45

16 Answers 16

24
(public|protected|private|static|\s) +[\w\<\>\[\]]+\s+(\w+) *\([^\)]*\) *(\{?|[^;])

I think that the above regexp can match almost all possible combinations of Java method declarations, even those including generics and arrays are return arguments, which the regexp provided by the original author did not match.

Sign up to request clarification or add additional context in comments.

7 Comments

I think that would do it. It will also exclude constructors too, which is nice :-)
What's the \s at the beginning for? Why would something start with two spaces? Prevents it from matching package scope unless the indentation is just right.
@seba229's answer includes some keywords that yours does not
You're missing native from that list
@Shane your regex fails to match the last character of the method name. (public|protected|private|static|\s) +[\w\<\>\[\],\s]+\s+(\w+) *\([^\)]*\) *(\{?|[^;]) works for me.
|
7

I also needed such a regular expression and came up with this solution:

(?:(?:public|private|protected|static|final|native|synchronized|abstract|transient)+\s+)+[$_\w<>\[\]\s]*\s+[\$_\w]+\([^\)]*\)?\s*\{?[^\}]*\}?

This grammar and Georgios Gousios answer have been useful to build the regex.

EDIT: Considered tharindu_DG's feedback, made groups non-capturing, improved formatting.

1 Comment

Why is threadsafe in the regex? It isn't a modifier in Java. Tried looking at the spec docs.oracle.com/javase/specs/jls/se7/html/jls-18.html and/or docs.oracle.com/javase/tutorial/java/nutsandbolts/…
5

After looking through the other answers, here is what I came up with:

#permission
   ^[ \t]*(?:(?:public|protected|private)\s+)?
#keywords
   (?:(static|final|native|synchronized|abstract|threadsafe|transient|{#insert zJRgx123GenericsNotInGroup})\s+){0,}
#return type
   #If return type is "return" then it's actually a 'return funcName();' line. Ignore.
   (?!return)
   \b([\w.]+)\b(?:|{#insert zJRgx123GenericsNotInGroup})((?:\[\]){0,})\s+
#function name
   \b\w+\b\s*
#parameters
   \(
      #one
         \s*(?:\b([\w.]+)\b(?:|{#insert zJRgx123GenericsNotInGroup})((?:\[\]){0,})(\.\.\.)?\s+(\w+)\b(?![>\[])
      #two and up
         \(\s*(?:,\s+\b([\w.]+)\b(?:|{#insert zJRgx123GenericsNotInGroup})((?:\[\]){0,})(\.\.\.)?\s+(\w+)\b(?![>\[])\s*){0,})?\s*
   \)
#post parameters
   (?:\s*throws [\w.]+(\s*,\s*[\w.]+))?
#close-curly (concrete) or semi-colon (abstract)
   \s*(?:\{|;)[ \t]*$

Where {#insert zJRgx123GenericsNotInGroup} equals

`(?:<[?\w\[\] ,.&]+>)|(?:<[^<]*<[?\w\[\] ,.&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,.&]+>[^>]*>[^>]*>)`

Limitations:

  • ANY parameter can have an ellipsis: "..." (Java allows only last)
  • Three levels of nested generics at most: (<...<...<...>...>...> okay, <...<...<...<...>...>...>...> bad). The syntax inside generics can be very bogus, and still seem okay to this regex.
  • Requires no spaces between types and their (optional) opening generics '<'
  • Recognizes inner classes, but doesn't prevent two dots next to each other, such as Class....InnerClass

Below is the raw PhraseExpress code (auto-text and description on line 1, body on line 2). Call {#insert zJRgxJavaFuncSigThrSemicOrOpnCrly}, and you get this:

^[ \t]*(?:(?:public|protected|private)\s+)?(?:(static|final|native|synchronized|abstract|threadsafe|transient|(?:<[?\w\[\] ,&]+>)|(?:<[^<]*<[?\w\[\] ,&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,&]+>[^>]*>[^>]*>))\s+){0,}(?!return)\b([\w.]+)\b(?:|(?:<[?\w\[\] ,&]+>)|(?:<[^<]*<[?\w\[\] ,&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,&]+>[^>]*>[^>]*>))((?:\[\]){0,})\s+\b\w+\b\s*\(\s*(?:\b([\w.]+)\b(?:|(?:<[?\w\[\] ,&]+>)|(?:<[^<]*<[?\w\[\] ,&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,&]+>[^>]*>[^>]*>))((?:\[\]){0,})(\.\.\.)?\s+(\w+)\b(?![>\[])\s*(?:,\s+\b([\w.]+)\b(?:|(?:<[?\w\[\] ,&]+>)|(?:<[^<]*<[?\w\[\] ,&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,&]+>[^>]*>[^>]*>))((?:\[\]){0,})(\.\.\.)?\s+(\w+)\b(?![>\[])\s*){0,})?\s*\)(?:\s*throws [\w.]+(\s*,\s*[\w.]+))?\s*(?:\{|;)[ \t]*$

Raw code:

zJRgx123GenericsNotInGroup -- To precede return-type    (?:<[?\w\[\] ,.&]+>)|(?:<[^<]*<[?\w\[\] ,.&]+>[^>]*>)|(?:<[^<]*<[^<]*<[?\w\[\] ,.&]+>[^>]*>[^>]*>)  zJRgx123GenericsNotInGroup
zJRgx0OrMoreParams  \s*(?:{#insert zJRgxParamTypeName}\s*(?:,\s+{#insert zJRgxParamTypeName}\s*){0,})?\s*   zJRgx0OrMoreParams
zJRgxJavaFuncNmThrClsPrn_M_fnm -- Needs zvFOBJ_NAME (?<=\s)\b{#insert zvFOBJ_NAME}{#insert zzJRgxPostFuncNmThrClsPrn}   zJRgxJavaFuncNmThrClsPrn_M_fnm
zJRgxJavaFuncSigThrSemicOrOpnCrly -(**)-    {#insert zzJRgxJavaFuncSigPreFuncName}\w+{#insert zzJRgxJavaFuncSigPostFuncName}    zJRgxJavaFuncSigThrSemicOrOpnCrly
zJRgxJavaFuncSigThrSemicOrOpnCrly_M_fnm -- Needs zvFOBJ_NAME    {#insert zzJRgxJavaFuncSigPreFuncName}{#insert zvFOBJ_NAME}{#insert zzJRgxJavaFuncSigPostFuncName}  zJRgxJavaFuncSigThrSemicOrOpnCrly_M_fnm
zJRgxOptKeywordsBtwScopeAndRetType  (?:(static|final|native|synchronized|abstract|threadsafe|transient|{#insert zJRgx123GenericsNotInGroup})\s+){0,}    zJRgxOptKeywordsBtwScopeAndRetType
zJRgxOptionalPubProtPriv    (?:(?:public|protected|private)\s+)?    zJRgxOptionalPubProtPriv
zJRgxParamTypeName -(**)- Ends w/ '\b(?![>\[])' to NOT find <? 'extends XClass'> or ...[]>  (*Original: zJRgxParamTypeName, Needed by: zJRgxParamTypeName[4FQPTV,ForDel[NmsOnly,Types]]*){#insert zJRgxTypeW0123GenericsArry}(\.\.\.)?\s+(\w+)\b(?![>\[])   zJRgxParamTypeName
zJRgxTypeW0123GenericsArry -- Grp1=Type, Grp2='[]', if any  \b([\w.]+)\b(?:|{#insert zJRgx123GenericsNotInGroup})((?:\[\]){0,}) zJRgxTypeW0123GenericsArry
zvTTL_PRMS_stL1c    {#insert zCutL1c}{#SETPHRASE -description zvTTL_PRMS -content {#INSERTCLIPBOARD} -autotext zvTTL_PRMS -folder ctvv_folder}  zvTTL_PRMS_stL1c
zvTTL_PRMS_stL1cSvRstrCB    {#insert zvCB_CONTENTS_stCB}{#insert zvTTL_PRMS_stL1c}{#insert zSetCBToCB_CONTENTS} zvTTL_PRMS_stL1cSvRstrCB
zvTTL_PRMS_stPrompt {#SETPHRASE -description zvTTL_PRMS -content {#INPUT -head How many parameters? -single} -autotext zvTTL_PRMS -folder ctvv_folder}  zvTTL_PRMS_stPrompt
zzJRgxJavaFuncNmThrClsPrn_M_fnmTtlp -- Needs zvFOBJ_NAME, zvTTL_PRMS    (?<=[ \t])\b{#insert zvFOBJ_NAME}\b\s*\(\s*{#insert {#COND -if {#insert zvTTL_PRMS} = 0 -then z1slp -else zzParamsGT0_M_ttlp}}\)    zzJRgxJavaFuncNmThrClsPrn_M_fnmTtlp
zzJRgxJavaFuncSigPostFuncName   {#insert zzJRgxPostFuncNmThrClsPrn}(?:\s*throws \b(?:[\w.]+)\b(\s*,\s*\b(?:[\w.]+)\b))?\s*(?:\{|;)[ \t]*$   zzJRgxJavaFuncSigPostFuncName
zzJRgxJavaFuncSigPreFuncName    (*If a type has generics, there may be no spaces between it and the first open '<', also requires generics with three nestings at the most (<...<...<...>...>...> okay, <...<...<...<...>...>...>...> not)*)^[ \t]*{#insert zJRgxOptionalPubProtPriv}{#insert zJRgxOptKeywordsBtwScopeAndRetType}(*To prevent 'return funcName();' from being recognized:*)(?!return){#insert zJRgxTypeW0123GenericsArry}\s+\b  zzJRgxJavaFuncSigPreFuncName
zzJRgxPostFuncNmThrClsPrn   \b\s*\({#insert zJRgx0OrMoreParams}\)   zzJRgxPostFuncNmThrClsPrn
zzParamsGT0_M_ttlp -- Needs zvTTL_PRMS  {#insert zJRgxParamTypeName}\s*{#insert {#COND -if {#insert zvTTL_PRMS} = 1 -then z1slp -else zzParamsGT1_M_ttlp}}  zzParamsGT0_M_ttlp
zzParamsGT1_M_ttlp  {#LOOP ,\s+{#insert zJRgxParamTypeName}\s* -count {#CALC {#insert zvTTL_PRMS} - 1 -round 0 -thousands none}}    zzParamsGT1_M_ttlp

3 Comments

Have to give props to RegexBuddy for helping me work through this: regexbuddy.com
Impressively thorough answer! I tried it in my .gitconfig, and got a "fatal: bad config line" error. I tried escaping the backslashes, and moved up to a "fatal: Invalid regexp to look for hunk header" error. Hmm. Maybe I need to learn what needs to be escaped in .gitconfig.
After skimming the git-config docs about what needs to be escaped, I surrounded the whole thing with double quotes, so that the semicolon wouldn't terminate the value. Still got the "fatal: Invalid regexp to look for hunk header" error.
3

Have you considered matching the actual possible keywords? such as:

(?:(?:public)|(?:private)|(?:static)|(?:protected)\s+)*

It might be a bit more likely to match correctly, though it might also make the regex harder to read...

4 Comments

That regex ended up matching the signature of all of the methods I had such as System.out.println() instead of just the declarations of methods.
Matches static variables and doesn't match package scope methods.
Here'a a fuller regex built off the above. IDE replace "$1 $2 $3(" and regex (unquoted) (public|private|static|protected) ([A-Za-z]+) ([A-Za-z0-9]+)(
What about no-modifier methods?
3

(public|private|static|protected|abstract|native|synchronized) +([a-zA-Z0-9<>._?, ]+) +([a-zA-Z0-9_]+) *\\([a-zA-Z0-9<>\\[\\]._?, \n]*\\) *([a-zA-Z0-9_ ,\n]*) *\\{

The Regex above will detect all possible java method definitions. Tested on lot's of source code files. To include constructors as well use the below regex :

(public|private|static|protected|abstract|native|synchronized) +([a-zA-Z0-9<>._?, ]*) +([a-zA-Z0-9_]+) *\\([a-zA-Z0-9<>\\[\\]._?, \n]*\\) *([a-zA-Z0-9_ ,\n]*) *\\{

Comments

2

I'm pretty sure Java's regex engine is greedy by default, meaning that "\w+ +\w+ *\(.*\) *\{" will never match since the .* within the parenthesis will eat everything after the opening paren. I recommend you replace the .* with [^)], this way you it will select all non-closing-paren characters.

NOTE: Mike Stone corrected me in the comments, and since most people don't really open the comments (I know I frequently don't notice them):

Greedy doesn't mean it will never match... but it will eat parens if there are more parens after to satisfy the rest of the regex... so for example "public void foo(int arg) { if (test) { System.exit(0); } }" will not match properly...

1 Comment

Greedy doesn't mean it will never match... but it will eat parens if there are more parens after to satisfy the rest of the regex... so for example "public void foo(int arg) { if (test) { System.exit(0); } }" will not match properly...
2

I came up with this:

\b\w*\s*\w*\(.*?\)\s*\{[\x21-\x7E\s]*\}

I tested it against a PHP function but it should work just the same, this is the snippet of code I used:

function getProfilePic($url)
 {
    if(@open_image($url) !== FALSE)
     {
        @imagepng($image, 'images/profiles/' . $_SESSION['id'] . '.png');
        @imagedestroy($image);
        return TRUE;
     }
    else 
     {
        return FALSE;
     }
 }

MORE INFO:

Options: case insensitive

Assert position at a word boundary «\b»
Match a single character that is a “word character” (letters, digits, etc.) «\w*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a “word character” (letters, digits, etc.) «\w*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “(” literally «\(»
Match any single character that is not a line break character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “)” literally «\)»
Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) «\s*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “{” literally «\{»
Match a single character present in the list below «[\x21-\x7E\s]*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   A character in the range between ASCII character 0x21 (33 decimal) and ASCII character 0x7E (126 decimal) «\x21-\x7E»
   A whitespace character (spaces, tabs, line breaks, etc.) «\s»
Match the character “}” literally «\}»


Created with RegexBuddy

Comments

2

This will pick the name of method not the whole line.

(?<=public static void )\w+|(?<=private static void )\w+|(?<=protected static void )\w+|(?<=public void )\w+|(?<=private void )\w+|(?<=protected void )\w+|(?<=public final void)\w+|(?<=private final void)\w+|(?<=protected final void)\w+|(?<=private void )\w+|(?<=protected void )\w+|(?<=public static final void )\w+|(?<=private static final void )\w+|(?<=public final static void )\w+|(?<=protected final static void )\\w+|(?<=private final static void )\w+|(?<=protected final static void )\w+|(?<=void )\w+|(?<=private static )\w+

Comments

1

A tip:

If you are going to write the regex in Perl, please use the "xms" options so that you can leave spaces and document the regex. For example you can write a regex like:

 m{\w+ \s+      #return type
   \w+ \s*      #function name
   [(] [^)]* [)] #params
   \s* [{]           #open paren
  }xms

One of the options (think x) allows the # comments inside a regex. Also use \s instead of a " ". \s stands for any "blank" character. So tabs would also match -- which is what you would want. In Perl you don't need to use / /, you can use { } or < > or | |.

Not sure if other languages have this ability. If they do, then please use them.

1 Comment

is there a similar option in java?
1

This is for a more specific use case but it's so much simpler I believe its worth sharing. I did this for finding 'public static void' methods i.e. Play controller actions, and I did it from the Windows/Cygwin command line, using grep; see: https://stackoverflow.com/a/7167115/34806

cat Foobar.java | grep -Pzo '(?s)public static void.*?\)\s+{'

The last two entries from my output are as follows:

public static void activeWorkEventStations (String type,
            String symbol,
            String section,
            String day,
            String priority,
            @As("yyyy-MM-dd") Date scheduleDepartureDate) {
public static void getActiveScheduleChangeLogs(String type,
            String symbol,
            String section,
            String day,
            String priority,
            @As("yyyy-MM-dd") Date scheduleDepartureDate) {

Comments

1

As of git 2.19.0, the built-in regexp for Java now seems to work well, so supplying your own may not be necessary.

"!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
"^[ \t]*(([A-Za-z_][A-Za-z_0-9]*[ \t]+)+[A-Za-z_][A-Za-z_0-9]*[ \t]*\\([^;]*)$"

(The first line seems to be for filtering out lines that resemble method declarations but aren't.)

Comments

1

This can completely define and separate a java function

(?<access>public|private|protected)?\s*(?<static>static|non-static)?\s*(?<final>final|non-final)?\s*(?<type>void|\w+)\s+(?<name>[a-zA-Z0-9_]+)\((?<parameter>.*?)\)\s*\{(?s)(?<body>.*?)\}

Java function

Compatible with:

  • PCRE2 (PHP >= 7.3)
  • JAVA8

Groups:

  • access: public
  • static: static
  • final: non-final
  • type: void
  • name: main
  • parameter: String[] args
  • body: // This is a Java function

1 Comment

doesn't work with try catch {} in the method.
1

This is a regex that not only detects most types of Java method signatures but also supports potential exception throwing(zero or many exception classes), data types having dots, and generics.

NOTE: Please consider that the method has to consist of an access modifiers (public, private, or protected) while using this regex:

\b(public|private|protected)\s*(<[^>]*>)?\s*(abstract|default|static|synchronized|final|native|transient)?\s*(abstract|default|static|synchronized|final|native|transient)?\s*(abstract|default|static|synchronized|final|native|transient)?\s+[^*](\w+(\.\w+)*)*(\s*<[^>]*>)?(\s*\[[^\]]*\])*(\s+\w+\s*\([^)]*\)(\s+throws\s+\w+(\s*,\s*\w+)*)?)+

Comments

0

I built a vim regex to do this for ctrlp/funky based on Georgios Gousios's answer.

    let regex = '\v^\s+'                " preamble
    let regex .= '%(<\w+>\s+){0,3}'     " visibility, static, final
    let regex .= '%(\w|[<>[\]])+\s+'    " return type
    let regex .= '\w+\s*'               " method name
    let regex .= '\([^\)]*\)'           " method parameters
    let regex .= '%(\w|\s|\{)+$'        " postamble

I'd guess that looks like this in Java:

^\s+(?:<\w+>\s+){0,3}(?:[\w\<\>\[\]])+\s+\w+\s*\([^\)]*\)(?:\w|\s|\{)+$

Comments

0

I found seba229's answer useful, it captures most of the scenarios, but not the following,

public <T> T name(final Class<T> x, final T y)

This regex will capture that also.

((public|private|protected|static|final|native|synchronized|abstract|transient)+\s)+[\$_\w\<\>\w\s\[\]]*\s+[\$_\w]+\([^\)]*\)?\s*

Hope this helps.

Comments

0
(public|private|static|protected) ([A-Za-z0-9<>.]+) ([A-Za-z0-9]+)\(

Also, here's a replace sequence you can use in IntelliJ

$1 $2 $3(

I use it like this:

$1 $2 aaa$3(

when converting Java files to Kotlin to prevent functions that start with "get" from automatically turning into variables. Doesn't work with "default" access level, but I don't use that much myself.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.