1

Suppose that *a* is a Java identifier. I would like a regex to match things like this:

\#a \#a.a.a (a any number of times)

but not this:

\#a. (ending with dot)

So in a phase like this: "#a.a less than #a." it would match only the first \#a.a (because it doesn't end with a dot).

This regex:

\#[a-zA-Z_$][\\w$]*(\\.[a-zA-Z_$][\\w$]*)*

almost does the job, but it matches the last case too.

Thank you.

Marcos

2
  • 1
    Possible duplicate: stackoverflow.com/questions/5205339/… Commented May 2, 2016 at 10:39
  • Although most Java identifiers use Ascii, all UTF-8 characters are allowed, so it's better to use \p{L} instead of a-zA-Z. Commented Jul 9, 2021 at 6:53

2 Answers 2

2

You almost got it right but some minor adjustments are needed. Consider this regex:

#[A-Za-z_$][\w$]*(?:\.[A-Za-z_$][\w$]*)*(?!\w*\.)

Live Demo: http://www.rubular.com/r/kJbSJKHhtv

Translated to Java:

(?i)#[a-z_$][\\w$]*(?:\\.[a-z_$][\\w$]*)*(?!\\w*\\.)
Sign up to request clarification or add additional context in comments.

1 Comment

With this statement "#a.a less than #var and #a.aaa." it still matches #a.aa I would like it to ignore the last #a.aaa. completely.
2

This can be accomplished with a negated look ahead. This first looks for "#text_$". It then looks for ".text_$" or more times. The match will be invalid if it ends with 0 or more of "text_$" and a period. This is assuming the i modifier is on.

At first I just had it as checking if it didn't end with a period, but that would just take away the last character in the match.

\\#([a-z_$][a-z_$\d]*)(\.[a-z_$][a-z_$\d]*)*(?![a-z_$\d]*\.)

Results

\#abc           => YES
\#abc.abc       => YES
\#abc.a23.abc   => YES
\#abc.abc.abc.  => NO
\#abc.2bc.abc   => NO

Try it out

5 Comments

@Marcos: added digits. The accepted answer also did not work for digits.
The complete regex that I'm using is this: (?i)#[a-zA-Z_$][\\w$]*(?:\\.[a-zA-Z_$][\\w$]*)*(?!\\w*\\.) So it works with digits.
@DanielGimenez: My answer definitely works with digits, you can try yourself.
@anubhava you're right. I suppose our answers are redudant because at the end I reached the same answer you had without the \w. I will delete in a few after I know you read this comment.
@DanielGimenez: Yes at this point I would think both answers look same (after you start using \w). But I would say just leave it like this, why delete.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.