1

I am writing a simple Java source file parser in Python. The main objective is to extract a list of method declarations. A method starts with public|private|protected (I assume there are no friendly methods without an access modifier, which is acceptable in my code base) and ends with a { but can't contain ; (could be multiple lines).

So my current regex pattern looks like:

((public|private|protected).*\n*.*?({|;))

I am not sure how to say the entire match group can't contain ; so I was trying to say get me something that ends with either { or ;, whichever comes first, non-greedy. However, that doesn't work and here is a chunk where it fails:

private static final AtomicInteger refCount = new AtomicInteger(0);

protected int getSomeVar() {

You can see that there is a variable declaration before the method declaration that starts with private but it does not have a {. So this is returned as one match and I wanted to have it as two matches, then I would be discarding the variable declaration in separate non-regex logic. But if you know how to exclude a ; before {, that would work too.

Essentially, how do I tell in a Python regex expression that a certain character (or a sub pattern) must not occur within the main pattern?

6
  • The most easiest way that come into my mind is to have another regular expression to eliminate the one that contains those certain character(s), so if (re.match(".*;.*", your_stuff) == None): then you know the string doesn't contain any semicolon Commented Nov 18, 2013 at 18:49
  • i was hoping for something slightly more elegant Commented Nov 18, 2013 at 18:50
  • 3
    I don't see why you're complicating things, isn't (?s)(public|private|protected).*?[{;] enough or am I missing something ? Commented Nov 18, 2013 at 18:51
  • i need \n for multi line declarations Commented Nov 18, 2013 at 18:54
  • 2
    @amphibient (?s) sets the s modifier to match newlines with . (dots). So .*? will match newlines if the s modifier is set. Commented Nov 18, 2013 at 18:55

2 Answers 2

2

You can use a negated character class to say "any character except (newline or) left brace or semicolon".

((public|private|protected)[^;{]*\n*[^;{]*?({|;))
Sign up to request clarification or add additional context in comments.

2 Comments

i am having problems with that. It does exclude the ; sign but that was getting me everything up until the first ;, even multiple {s
Updated: I added the left brace to the exclude expression, too.
1

This finally worked:

((public|private|protected)[^;{]*?{)

Notice how I had to specify to exclude both ; and { before the first {

1 Comment

If you aren't using dots . then setting (?s) would have no effect. DOCS

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.