1

My program works how I want it to but I stumbled upon something that I don't understand.

String problem = "4 - 2";
problem = problem.replaceAll("[^-?+?0-9]+", " ");
System.out.println(Arrays.asList(problem.trim().split(" ")));

prints [4, -, 2]

but

String problem = "4 - 2";
problem = problem.replaceAll("[^+?-?0-9]+", " ");
System.out.println(Arrays.asList(problem.trim().split(" ")));

doesn't even do anything with the minus sign and prints [4, 2]

Why does it do that, it seems like both should work.

1
  • I've found regex debugging webapps like debuggex.com very useful. Commented Dec 26, 2016 at 20:14

3 Answers 3

3

The hyphen has a special meaning inside a character class: it is used to define a character range (like a-z or 0-9), except when:

  • it is at the start of the character class or immediately after the negation character ^
  • it is escaped with a backslash
  • it is at the end of the character class
  • with some regex engines when it is after a shorthand character class like \w, \s, \d, \p{thing},... (for these one, the situation isn't ambiguous, it can't be a range)

In the first example, it is seen as a literal hyphen (since it is at the beginning).

In your second example, I assume that ?-? defines a range between ? and ? (that is nothing more than the character ?)

Note: ? doesn't have a special meaning inside a character class (it's no more a quantifier but a simple literal character)

Sign up to request clarification or add additional context in comments.

Comments

0

If you are trying to match a literal - inside of a [ and ], it must be escaped, \-. In the first case, ^ marks the beginning of a match, so really you are match -?, so there is nothing to escape. In the second case, it seems like you are matching ?-?, which can cause the regular expression to function in a way you did not expect.

PS: To escape in Java, you need \\ instead of \.

Comments

0

In the second example, +?-? means "a plus sign, or any chars between ? and ?, inclusive. Of course, that means just ?, so the whole regex is equivalent to [^+?0-9]+.

The only time within a character class (between the square brackets) that - doesn't mean "between, inclusive" is at the start of the character class, or immediately following a ^ that starts it, or at the end of the character class, or when it's escaped (\-).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.