5

I need some help. Im getting:

Caused by: java.util.regex.PatternSyntaxException: Unclosed character class near index 24
^[a-zA-Z└- 0-9£µ /.'-\]*$
                        ^
        at java.util.regex.Pattern.error(Pattern.java:1713)
        at java.util.regex.Pattern.clazz(Pattern.java:2254)
        at java.util.regex.Pattern.sequence(Pattern.java:1818)
        at java.util.regex.Pattern.expr(Pattern.java:1752)
        at java.util.regex.Pattern.compile(Pattern.java:1460)
        at java.util.regex.Pattern.<init>(Pattern.java:1133)
        at java.util.regex.Pattern.compile(Pattern.java:823)

Here is my code:

String testString = value.toString();

Pattern pattern = Pattern.compile("^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'-\\]*$");
Matcher m = pattern.matcher(testString);

I have to use the unicode value for some because I'm working with xhtml.

Any help would be great!

2 Answers 2

23

Assuming that you want to match \ and - and not ]:

Pattern pattern = Pattern.compile("^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'\\\\-]*$");

You need to double escape your backslashes, as \ is also an escape character in regex. Thus \\] escapes the backslash for java but not for regex. You need to add another java-escaped \ in order to regex-escape your second java-escaped \.

So \\\\ after java escaping becomes \\ which is then regex escaped to \.

Moving - to the end of the sequence means that it is used as a character, instead of a range operator as pointed out by Pshemo.

Sign up to request clarification or add additional context in comments.

1 Comment

I think everyone hit on. I didn't realize I would need to quadruple '\' to achieve that backslash in my regex. I appreciate everyone's input.
2

It is hard to say what are you trying to achieve, but I can see few strange things in your regex:

  1. you have opened class of characters but never closed it. Instead you used \\] which makes ] normal character.
    • If you want to include ] in your characters class then you need additional ] at the end, like "^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'-\\]]*$"
    • if you want to include \ in your characters class then you need to use \\\\ version, because you need to escape its special meaning two times, in regex engine, and in Javas String
  2. you used - with ('-\\]) which in character class is used to specify range of characters like a-z or A-Z. To escape its special meaning you need to use \\-

6 Comments

Could this work Pattern pattern = Pattern.compile("^[a-zA-Z\300-\3770-9\u0153\346 \u002F.'\\-\\\\]*$"); ? Back slashes need to be quadruple escaped when written as Java Strings
@JasonSperske Good point with \\\\] in case OP wants to include \ in his characters class.
@JasonSperske If you want to include - as a character, you can just put it at the end. If there's nothing after it in the character class it will be assumed not to be acting as an operator.
@Jeff from what I heard placing - at the end or start of character class works for Java but it might be not good practice since not all languages/regex engines accept this way of using it. So it is better to explicit escape its special meaning.
@Jeff correct. I wanted to include that so I kept it at the end.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.