1

I'm trying to capture assignment operations from a text file using 'java.util.regex.Pattern'. I've been very frustrated trying to fix my regular expression to actually recognize what I am looking for. I've simplified the problem as much as I can and found an issue with picking up white space.

This post proved helpful, and sheds light on issues dealing with the whitespace character set, but does not answer the question of why the following is not working:

Pattern p = Pattern.compile("adfa =");
Scanner sc = new Scanner("adfa =");

if(sc.hasNext(p))
{
    String s = sc.next(p);
    System.out.println(">" + s + "<");
}
else
    System.out.println(":(");

If I try this:

Pattern p = Pattern.compile("\\w+ *=");

The following string is picked up:

"adfa="

But not:

"adfa ="

Simply by making the following change:

Pattern p = Pattern.compile("adfa=");
Scanner sc = new Scanner("adfa=");

All works as intended! Can anyone shed any light on what is going wrong?

2 Answers 2

5

From the documentation, Scanner#hasNext(Pattern): -

Returns true if the next complete token matches the specified pattern. A complete token is prefixed and postfixed by input that matches the delimiter pattern.

Now, since the default delimiter pattern for Scanner is \p{javaWhitespace}+. Find it out by using Scanner#delimiter() method: -

Scanner sc = new Scanner("abdc =");
System.out.println(sc.delimiter());  // Prints \p{javaWhitespace}+

So, when your Scanner encounters a whitespace in your string. It assumes that the token has ended. And hence stops there, and tries to match the read token with your pattern. And of course it fails, and hence sc.hasNext(p) return false. This is the problem.

Sign up to request clarification or add additional context in comments.

2 Comments

EDIT: Just read that the default is indeed any whitespace. Thanks! Not sure how I managed to not read that and assume the delimiter was '\n' or EOF.
@Daeden.. Try printing the value of sc.delimiter(). You would get \p{javaWhitespace}+. I hope that makes it clear.
2

From Scanner.hasNext(Pattern) javadoc: Returns true if the next complete token matches the specified pattern. A complete token is prefixed and postfixed by input that matches the delimiter pattern.

In Scanner, the withespace is the default delimiter, so in your example the Scanner tries to match the token "adfa" with the regex, which doesn't match. If you change the delimiter to something else, like a line feed:

sc.useDelimiter("\n");

Your regex should work.

EDIT: My answer a bit late!

1 Comment

I appreciate your response nonetheless!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.