3

Need a Java regex pattern for the following scenario:

Case 1:

Input string:

"a"

Matches:

a

Case 2:

Input string:

"a b"

Matches:

a b

Case 3:

Input string:

"aA Bb" cCc 123 4 5 6 7xy "\"z9" "\"z9$^"

Matches:

aA Bb
cCc
123
4
5
6
7xy
"z9
"z9$^

Case 4:

Input string:

"a b c

Matches:

None - since the quotes are unbalanced, hence pattern match fails.

Case 5:

Input string:

"a b" "c

Matches:

None - since the quotes are unbalanced, hence pattern match fails.

Case 6:

Input string:

"a b" p q r "x y z"

Matches:

a b
p 
q 
r
x y z

Case 7:

Input string:

"a b" p q r "x y \"z\""

Matches:

a b
p 
q
r
x y "z"

Case 8:

Input string:

"a b" p q r "x \"y \"z\""

Matches:

a b
p 
q 
r
x "y "z"

And of course, the simplest one:

Case 9:

Input string:

a b

Matches:

a
b

Tried using a pattern, but it doesn't seem to match all above cases.

public List<String> parseArgs(String argStr) {
    List<String> params = new ArrayList<String>();
    String pattern = "\\s*(\"[^\"]+\"|[^\\s\"]+)";
    Pattern quotedParamPattern = Pattern.compile(pattern);
    Matcher matcher = quotedParamPattern.matcher(argStr);
    while (matcher.find()) {
        String param = matcher.group();
            System.out.println(param);
            params.add(param);
    }
    return params;
}

public void test(String argStr) {
    String[] testStrings = new String[]{"a", "a b", "a b \"c\"", "a b \"c"};
    for(String s: testStrings){
        parseArgs(s);
    }
}
6
  • 1
    It can be solved but you need to show some effort of solving it. At least place all the example input strings in an array of String (String[]) and place that Java code here. Commented Apr 22, 2014 at 17:37
  • @anubhava Added the Java code and a few input strings. Commented Apr 22, 2014 at 17:54
  • I'm guessing your actual strings will be more than just single lowercase letters. Do they need to contain capital letters, numbers, special characters, etc? Is there a limit to how long they should be? Commented Apr 22, 2014 at 18:01
  • @CAustin There may be capital letters, numbers, special characters and so on, but there's no limit as such. Commented Apr 22, 2014 at 18:03
  • I'm not sure what you want with this but I'm sure the answer is way before you arrive to this. Commented Apr 22, 2014 at 18:04

4 Answers 4

2

Have written a class "CLIParser" which will give you the result.

//instantiate the CLIParser 

CLIParser parser = new CLIParser("\"a b\" p q r \"x y z\"");

//call the method getTokens which gives you the result.

ArrayList<String> resultTokens = parser.getTokens();


###################CLI Parser Class definition#################################

class CLIParser {
    private String cmdString;

    public CLIParser(String cmdString) {
        this.cmdString = cmdString;
    }

    public ArrayList<String> getTokens() throws Exception {
        ArrayList<String> finalTokens = new ArrayList<String>();
        ArrayList<StringBuffer> tokens = new ArrayList<StringBuffer>();
    char inArray[] = this.cmdString.toCharArray();
    StringBuffer token = new StringBuffer();
    int valid = checkIfTheStringIsValid(inArray);
    if (valid == -1) {
        for (int i = 0; i <= inArray.length; i++) {

            if (i != inArray.length) {
                if ((inArray[i] != ' ') && (inArray[i] != '"')) {
                    token.append(inArray[i]);
                }

                if ((inArray[i] == '"') && (inArray[i - 1] != '\\')) {
                    i = i + 1;
                    while (checkIfLastQuote(inArray, i)) {
                        token.append(inArray[i]);
                        i++;
                    }
                }
            }
            if (i == inArray.length) {
                tokens.add(token);
                token = new StringBuffer();
            } else if (inArray[i] == ' ' && inArray[i] != '"') {
                tokens.add(token);
                token = new StringBuffer();
            }
        }
    } else {
        throw new InvalidCommandException(
                "Invalid command. Couldn't identify sequence at position "
                        + valid);
    }
    for(StringBuffer tok:tokens){
        finalTokens.add(tok.toString());
    }
    return finalTokens;
}

private static int checkIfTheStringIsValid(char[] inArray) {
    Stack myStack = new Stack<Character>();
    int pos = 0;
    for (int i = 0; i < inArray.length; i++) {
        if (inArray[i] == '"' && inArray[i - 1] != '\\') {
            pos = i;
            if (myStack.isEmpty())
                myStack.push(inArray[i]);
            else
                myStack.pop();
        }
    }
    if (myStack.isEmpty())
        return -1;
    else
        return pos;
}

private static boolean checkIfLastQuote(char inArray[], int i) {
    if (inArray[i] == '"') {
        if (inArray[i - 1] == '\\') {
            return true;
        } else
            return false;
    } else
        return true;
}
}
Sign up to request clarification or add additional context in comments.

Comments

2

I don't know the straight way to solve with regex.

But you can replace the inner escape sequences with some unique keyword, then you can match your strings with regex.

String[] testStrings = new String[]{
         "a", "a b", "a b \"c\"", "a b \"c", "\"a b\" p q r \"x y z\""};
Pattern parsingPattern = Pattern.compile("(\".*?\")|( [^ ^\"]+)");
for(String s: testStrings) {
   s=s.replace("(?<!\\)\\"","@@@");
}
for(String s: testStrings) {
    List<String> params = null;
    int count = StringUtils.countMatches(s, "\"");
    if(count%2==0){
    params = new ArrayList<String>();
    Matcher matcher = parsePattern.matcher(s); 
    while (matcher.find())
        params.add( matcher.group(1) != null ? matcher.group(1) : matcher.group(2));
   }
}

Once you get the matches, you can replace your unique identifier with actual keyword..

I haven't tested the code snippet, but I hope you can do some minor tweaks to make it work.

1 Comment

Seems like a fair solution. Thanks!
0

Give this a try:

("\S+?(?: \S+?)*"|\S+?)

See it in action: http://regex101.com/r/fA5hN0

Just run a global match and return \1. Each capture group that gets returned should contain what you want.

1 Comment

Is there a Java regex string available cuz' the above pattern needs escaping slashes and quotes and even when escaped, it doesn't seem to work for all cases. Tried for this input string: "a b" p q r "x y \"z\""
0

To get you started you can use this Java regex based code:

public List<String> parseArgs(String argStr, Pattern validPattern, Pattern parsePattern) {
    List<String> params = null;
    if (validPattern.matcher(argStr).matches()) {
        params = new ArrayList<String>();
        Matcher matcher = parsePattern.matcher(argStr); 
        while (matcher.find())
            params.add( matcher.group(1) != null ? matcher.group(1) : matcher.group(2));
    }
    return params;
}

public void parseIt() {
    Pattern validatePattern = Pattern.compile("^(?=(?:(?:[^\"]*\"){2})*[^\"]*$).*$");
    Pattern parsingPattern = Pattern.compile("\"([^\"]*)\"|(\\w+)");

    String[] testStrings = new String[]{
             "a", "a b", "a b \"c\"", "a b \"c", "\"a b\" p q r \"x y z\""};
    for(String s: testStrings) {
        List<String> parsedList = parseArgs(s, validatePattern, parsingPattern);
        System.out.printf("input: %-30s :: parsed: %s%n", s, parsedList);
    }
}

OUTPUT:

input: a                              :: parsed: [a]
input: a b                            :: parsed: [a, b]
input: a b "c"                        :: parsed: [a, b, c]
input: a b "c                         :: parsed: null
input: "a b" p q r "x y z"            :: parsed: [a b, p, q, r, x y z]

PS: Though I have noticed you latest edits where you have added nested quotes also, this answer needs to be enhanced for that.

8 Comments

This seems to work for most cases. But I'm trying to extract the strings in which there are escaped quotes as well. Tried with this: "aA Bb" cCc 123 4 5 6 7xy "\"z9" "\"z9$^" This is supposed to give me "z9 as 8th argument and "z9$^ as the 9th. Thanks.
Yes I made a note of it since i didn't notice escaped/nested quotes in your question earlier.
What regular expression would match all the above cases?
There is no end to escaping actually. Even backslash can be escaped. How are you getting this input?
Yes, that's correct. That's the way it's supposed to work for the app I'm building. Even this is valid "aA Bb" cCc 123 4 5 6 7xy "\"\\z9" "\"z9$^". And the strings are being read from stdin.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.