2

I am creating a lexical from scratch and I am getting into the part of matching (")[\\w]+("). I have this regular expression ^(\")[\\w]+(\")$, but it won't catch the string.

SSCCE:

Map<String, String> lexicalMap = new HashMap<>();
// add all regex to `lexicalMap` via `lexicalMap.put([regex], [tokentype])`

// Tokenize the string format of the syntax to `List<String> tokens`
// List<String> tokens contains ["string", "data", "=", "test"] on the syntax: string data = "test"
for(String element : tokens) {
    for(String regex : lexicalMap.keySet()) {
        if(element.matches(regex))
            System.out.print(lexicalMap.get(regex) + " ");
    }
}
System.out.println();

REGEXs:

identifier = ^[\\w]+$
operator = ^(\\=)$
string = ^(\")[\\w]+(\")$ // THE PROBLEM
keyword = ^(string)$

Here is the case input/ouput I am following:

INPUT:

"test"
""
test
string data = "test"

OUTPUT:

string
string
identifier
keyword identifier operator string

UPDATED: 02/22/2013

  • Added SSCCE segment.
13
  • 1
    Show us your code. How are you doing the match? Commented Feb 22, 2013 at 14:24
  • I have added my SSCCE on how I am doing the match. Commented Feb 22, 2013 at 14:35
  • I guess your tokens should contain: ["string", "data", "=", "\"test\""]``. Note how I stored "test"` Commented Feb 22, 2013 at 14:37
  • 1
    wait wait. \"data\" is not a string. If it would have been like - "\"data\"", then that is a string. And that would be stored like - \"\\\"data\\\"\". You first need to be sure of what all kinds of input you are getting? Commented Feb 22, 2013 at 14:42
  • 1
    You would have to do much more work to make this program work completely. The problem is, a keyword is also a valid identifier. So, it will match two regexes in Map. And I would say, you should not match a keyword with regex. Since keywords in Java are fixed, so better to have a Set of all those keywords, and match against that set. Commented Feb 22, 2013 at 14:52

1 Answer 1

1

I don't know what happened but after changing the regular expression from ^(\")[\\w]+(\")$ to ^(\")[\\w]*(\")?$ it worked correctly.

Sign up to request clarification or add additional context in comments.

3 Comments

It is going to break badly with a simple string such as "@" or "Enter a number: "
Adding ? to (\") makes the " optional, are you sure that is what you want? \\w+ means one or more word characters, \\w* means zero or more. The brackets in your regex seem pointless, so "^\"\\w*\"$", though it only allows word characters between the quotes.
@MikeM true that making the second " can destroy the string grammar. I just need to have an additional procedure to validate the grammar among the string tokens.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.