1

I am writing a Java program to read other Java source files and pull out there import statements:

package com.me.myapp

import blah.example.dog.client.Fizz;
import blah.example.cat.whiskers.client.Buzz;
import blah.example.shared.Foo;
import blah.example.server.Bar;
...etc.

I want the regex to return anything starting with import blah.example. and that has client in the package name after that. Hence the regex would pick up Fizz and Buzz in the example above, but not Foo or Bar.

My best attempt is:

String regex = "import blah.example*client*";
if(someString.matches(regex))
    // Do something

This regex isn't throwing an exception, but itsn't working. Where am I going wrong with it? Thanks in advance!

2
  • 3
    You can't reliably parse source code with regular expressions. You'd be better off using an actual java parser. Commented Aug 16, 2013 at 3:10
  • @BradMace Good point. It would be impossible to, say, reliably skip imports that were commented out in a multiline comment block. Commented Aug 16, 2013 at 3:16

5 Answers 5

2

A dot in a regex is a special character that means "any character". You have to escape a literal dot, and you want a dot before your * (meaning any number of occurrences of any character):

"import blah\\.example.*client.*"

The expression as you had it:

"import blah.example*client*"

Meant "import blah", followed by a single wildcard character, followed by "exampl", then 0 or more e's, then "clien", then 0 or more t's. It would match, say, "import blahxexampleeeeeclientttt" or "import blah examplclien".

Also, the (fixed) regex will still match things like "import blah.example2.notclient" and "/* import blah.example.client; */", so you still want to enforce the location of literal dots around client and the start of line, e.g. (unescaped for clarity, remember to escape slashes in string constants):

^import blah\.example(\.[^.]+)*\.client(\.[^.]+)*;

Where the sequence (unescaped for clarity):

(\.[^.]+)*

Matches any number of individual ".xxx" path components.

Note, however, like Brad Mace points out in the comments, regular expressions alone still aren't reliable for this. You don't have a good way to skip, e.g. a bunch of import statements commented out by a /* */ multiline comment.

Sign up to request clarification or add additional context in comments.

3 Comments

You can use [^.]. Dot loses its special meaning in there.
Thanks @Jason C (+1) - however String regex = "^import blah\.example(\.[^.]+)*\.client(\.[^.]+)*;"; gives me a compiler error...
That's why I said "remember to escape slashes in string constants"! Replace each \ with \\ :)
2

You can try with ^import blah[.]example[.](\\w+[.])*client[.]\\w+;$ with MULTILINE flag to make ^ and $ match also start and end of new lines.

Here is some demo:

String data = "package com.me.myapp\n\nimport blah.example.dog.client.Fizz;\nimport blah.example.cat.whiskers.client.Buzz;\nimport blah.example.shared.Foo;\nimport blah.example.server.Bar;";

Pattern p = Pattern.compile(
        "^import blah[.]example[.](\\w+[.])*client[.]\\w+;$",
        Pattern.MULTILINE);
Matcher m = p.matcher(data);
while (m.find())
    System.out.println(m.group());

Output

import blah.example.dog.client.Fizz;
import blah.example.cat.whiskers.client.Buzz;

You can also use the similar regex to check if it matches your strings/lines

String data = "package com.me.myapp\n\nimport blah.example.dog.client.Fizz;\nimport blah.example.cat.whiskers.client.Buzz;\nimport blah.example.shared.Foo;\nimport blah.example.server.Bar;";

Scanner scanner = new Scanner(data);
while (scanner.hasNextLine()){
    String line=scanner.nextLine();
    if (line.matches("import blah[.]example[.](\\w+[.])*client[.]\\w+;")){
        System.out.println(line);
    }
}

4 Comments

In this answer I assume that OP have formatted code without spaces at start or after ;, also without import statements in comments. In case it is not clean code regex will need to change a little :)
You actually wouldn't be able to detect if an import statement was in a multiline comment without doing some basic amount of additional parsing.
@JasonC with look-around I could try but yes, it would not be easy task and should be done with parser rather than regex.
@Pshemo With disclaimer! We all can add: Ctrl + Shift + F required in Eclipse :)
1

Assuming that someString is one of the lines from the Java source code

Java String

"import\\s+blah\\.example(?:\\.\\w+)*\\.client(?:\\.\\*|(?:\\.\\w+)*);"

Regex

import\s+blah\.example(?:\.\w+)*\.client(?:\.\*|(?:\.\w+)*);

3 Comments

You can use repeating patterns of (\.[^\.]+)* to match path components.
@JasonC I actually wanted to just fix OP's regex so that it doesn't fail to match valid imports. But, this soon became find the regex validator for Java import statements! :)
Yes, that happened to me too.
1

threating sources as text files can be problematic....

i would try the following approaches instead: * using javac processor framework to integrate your matcher into the compiler * using the ASM library

Comments

0

Regex may parse src incorrectly, eg commented out imports

/*
import blah.example.dog.client.Fizz;
import blah.example.cat.whiskers.client.Buzz;
*/

or not formatted code

import blah.example.dog.client.Fizz; import blah.example.cat.whiskers.client.Buzz;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.