I'm quite sure that this has a simple solution, but I've been searching for three hours and haven't managed to find anything that helps me.
I'm writing a parser in Java using regex and I'm supposed to be able to match some previously decided words, numbers from 1-10000 and hex color codes. Now it's going great matching the words, but the reader isn't reading the numbers and color codes as a whole. For example it reads the input:
DOWN. COLOR #000000.
as:
Reading: DOWN Returning: Down
Reading: . Returning: Dot
Reading: Returning: Whitespace
Reading: COLOR Returning: Color
Reading: Returning: Whitespace
Reading: # Returning: nothing
Reading: 0 Returning: Number
Reading: A Returning: nothing
Reading: F Returning: nothing
Reading: 2 Returning: Number
Reading: 3 Returning: Number
Reading: 4 Returning: Number
Reading: . Returning: Dot
So it's able to read the words COLOR and DOWN as a whole as I want but it doesn't read the color code #000000. I would ideally want those seven lines to be:
Reading: #0AF234 Returning: Colorcode
I have:
String stringTokens = "DOWN|COLOR|(\\s|\\t)+|\\n|\b[1-9][0-9]{0,3}\b|10000|^(#)([a-fA-F0-9]{6})$";
Pattern stringPattern = Pattern.compile(stringTokens, Pattern.CASE_INSENSITIVE);
Matcher m = stringPattern.matcher(input);
Then:
while (m.find()) {
if (m.start() != inputPos) {
tokens.add(new Token(lineNo, TokenType.Invalid));
}
if (m.group().matches("^(#)([a-fA-F0-9]{6})$"))
tokens.add(new Token(lineNo, TokenType.ColorCode));
else if (m.group().equals("."))
tokens.add(new Token(lineNo, TokenType.Dot));
else if (m.group().matches("DOWN"))
tokens.add(new Token(lineNo, TokenType.Down));
else if (m.group().matches("COLOR"))
tokens.add(new Token(lineNo, TokenType.Color));
else if (Character.isDigit(m.group().charAt(0)))
tokens.add(new Token(lineNo, TokenType.Number, Integer.parseInt(m.group())));
else if (m.group().matches("\\n")) {
tokens.add(new Token(lineNo, TokenType.Whitespace));
lineNo++;
}
else if (m.group().matches("(\\s|\\t)+"))
tokens.add(new Token(lineNo, TokenType.Whitespace));
inputPos = m.end();
}
So my question is basically:
How do I manage to read the groups regarding the color codes and numbers together? When I print out m.group() for each reading now, it only returns single digits. Yet I was looking at another code where the digits are read in the same format, with the regex above simply [0-9]+, which is too simple for me. Then each group was read as the whole number.
I have tried to use something along the lines of m.group(1) and m.group(2), used the word boundaries (which I don't understand completely) and the ^$ format, but nothing seems to work to read the token as a whole.
I hope I managed to keep the code I copied simple without missing anything important, and that someone can help me figure this simple (it must be?!) thing out. Thank you! :)