1

So I have a String I want to split into tokens of different types as part of a larger Parser.

String input = "45 + 31.05 * 110 @ 54";

I use javas regex libraries Pattern and Matcher to interpret my regexes and find matches.

String floatRegex = "[0-9]+(\\.([0-9])+)?";
String additionRegex = "[+]";
String multiplicationRegex = "[*]";
String integerRegex = "[0-9]+"

All my regexes gets merged into a single master regex with pipe symbols between the different regexes.

String masterOfRegexes = "[0-9]+(\\.([0-9])+)?|[+]|[*]|[0-9]+"

I send this pattern into Pattern.compile() and get the matcher. As I step though from left to right running matcher.find(), I expect to get this structure out, up to the point of the "@" symbol where an InvalidInputException should be thrown.

[
  ["Integer": "45"],
  ["addition": "+"],
  ["Float": "31.05"],
  ["multiplication": "*"],
  ["Integer": "110"]
  Exception should be thrown...
]

Problem is that matcher.find() skips the "@" symbol completely and instead find the match of the next Integer past "@", which is "54".

Why does it skip the "@" symbol and how can I make it so the exception gets thrown on a character it doesn't recognize from my pattern?

4
  • It does not "skip" it, the @ never gets matched. See the matches here regex101.com/r/gEeiNv/1 Commented Sep 24, 2021 at 10:32
  • So how can I get matcher to throw the exception when a character doesn't get matched? Commented Sep 24, 2021 at 10:35
  • You might use a pattern like ([0-9]+(?:\.[0-9]+)?|[+]|[*]|[0-9]+)|\S+ and check for group 1. If group 1 is null, then you can throw your exception. See regex101.com/r/0RmvB1/1 and see ideone.com/hzWmuF Commented Sep 24, 2021 at 10:39
  • Yes, like in ideone.com/SPwkRQ Commented Sep 24, 2021 at 10:41

2 Answers 2

2

A regex matches or it does not match. In your example data, it does not skip over the @, it just does not match it.

What you could do is identify the valid matches in a single capture group, and when looping though the matches check if group 1 is not null.

If it is not, then the pattern has a valid group 1 match, else you can throw your Exception.

See a regex demo and a Java demo.

String regex = "([0-9]+(?:\\.[0-9]+)?|[+]|[*]|[0-9]+)|\\S+";
String string = "45 + 31.05 * 110 @ 54";

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    if (matcher.group(1) == null) {
        // your Exception here
        // throw new Exception("No match!");
        System.out.println(matcher.group() + " -> no match");
    } else {
        System.out.println(matcher.group(1) + " -> match");
    }
}

Output

45 -> match
+ -> match
31.05 -> match
* -> match
110 -> match
@ -> no match
54 -> match
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks man, that did help me. My implementation worked as I imagined after adding "\\S" as the end of my master regex. Passes all my tests :)
0

Matcher knows:

  • matches: matching all, the entire input
  • find: somewhere in the input
  • lookingAt: from start, but not necessarily to the end

Your use of find skipped the "@". Use the rare lookingAt, or check the find start/end positions.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.