0

I'm making a JS "command line" emulator.

I have Regexp: /([^\s"]+)|"([^\s"]+)"/g. I want to match single words, like echo, wyświetl, jd923h90asd8. Also, I want to match "string literals" - something like "this is a string" or "f82h3 23fhn aj293 dgja3 xcn32".

I'm using match method on input string to get array of all matches. But problem is: when Regexp matches "string literal" and returns string to array, this string INCLUDES double-quotes. I don't want double-quotes, but the question is - why Regexp includes double-quotes? In the Regexp, quotes "" are excluded from () group. Why Regexp includes it all?

EDIT:

var re = /([^\s"]+)|"([^\s"]+)"/g;

var process = function (text) {
    return execute(text.match(re));
}

var execute = function (arr) {
    console.log(arr);
    try {
        //... apply a function with arguments...
    } catch (e) {
        error(arr[0]+": wrong function");
        return "";
    }
}

For input echo abc "abc def" "ghi" Regexp returns array ["echo", "abc", "abc", "def", ""ghi""]. I want to make a Regexp, that from that input will return ["echo", "abc", "abc def", "ghi"].

3
  • Can you demonstrate this in action, perhaps a live demo? Or, at the very least, show the code you're using in your question. Your description isn't as clear as it could be, I'm afraid. That, or it's still too early for my brain... Commented Aug 27, 2013 at 9:12
  • Ok, I will add some code. Commented Aug 27, 2013 at 9:13
  • It may not be such a bad idea to keep the quotations. Simply strip the double quotes when you need the string contents. Aside from allowing parameters with spaces, it's also a type indicator as well. You may one day decide that parameters without double quotation marks could be variables, in which case it would be necessary to distinguish a string from a possible variable name (in other words, you may want sort varname to have a different meaning than sort "varname". Commented Aug 27, 2013 at 9:46

3 Answers 3

4

The quoted part of your regex ("([^\s"]+)") doesn't allow spaces within the quote. Try removing the \s from it. Could also consider using * instead of + if you need to match empty strings (""):

/([^\s"]+)|"([^"]*)"/g 
Sign up to request clarification or add additional context in comments.

4 Comments

Sorry, are you saying you want the quotes to be included in the captured result?
No no, I want the quotes to be EXCLUDED. With the current code, quotes are INCLUDED and I don't want them there.
As MarioRossi pointed out, you need to be looking at the second capturing group rather than the whole match to do this. In all cases where the second capturing group is not null, it should be used in preference to the whole match.
I found a way to do it - I will use re.exec & while loop instead of text.match.
3

This is the only possible explanation. Even without looking at any code.

Use group(1) or group(2). Not group() or group(0). The later 2 (which are fully equivalent) always return the whole matched string, which in your case includes the quotes. I hope this explains what's going on.

PS: As your RegEx is an "or" RegEx, group(1) and group(2) will never have both content at the same time. One, the other, or both will be null or empty. The later when there is no match.

I just realized your are using the match method to retrieve all matches as an array. In this case, let me say that this method always captures the whole matched strings in each case (the equivalent to group(0) above). There is no way of telling it to retrieve other groups (like 1 or 2). In consequence, you have 3 alternatives:

  1. Remove the "s from strings with them in the resulting array through some "post-processing".
  2. Do not use JavaScript's match method, but create your own equivalent (and use group(1) or group(2) according to the case in it).
  3. Change your regular expression to match the quotes as zero-width positive lookaheads and lookbehinds. Not sure if JavaScript supports this, but it should be /([^\s"]+)|(?<=")([^\s"]+)(?=")/g

6 Comments

Where I should use group(1) and group(2)?
@AreWojciechowski Updated my answer. I didn't register you were using the match method.
@AreWojciechowski Alternative 3 is the most elegant, but I'm not sure it is supported by JavaScript. Alternative 1 is more dirty and requires a little extra bit of coding, but will work for sure. I'd say try them in this order.
@MarioRossi Re: 3, I was also wondering about this but apparently Javascript doesn't support lookbehind.
I didn't remember using them, and I've not been able to test them yet either... :( I think I'm posting a question so I don't forget :)
|
0

To match JavaScript String literals. Here's what you're looking for:

/(\w+|("|')(.*?)\2)/g

To explain this: you're either looking for unquoted word characters OR matching quotes with anything in between (e.g. quotes should match correctly, for example: "it's his dog" using regex backreference).

This is simplified to be wary that it does not match escaped a string like:

"my \"complex\" string"

It didn't look like you were worried about that last scenario.

http://regexr.com/3bdbi

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.