5

I have a need to split a string on space character (' ') but while excluding any spaces that come within 2 specific characters (say single quotes).

Here is an example string:

This-is-first-token This-is-second-token 'This is third token'

The output array should look like this:

[0] = This-is-first-token
[1] = This-is-second-token
[2] = 'This is third token'

Question: Can this be done elegantly with regular expression?

4
  • I am not sure that this is elegant but /[a-zA-Z-]+|['"][\sa-zA-Z-]+['"]/g Commented Nov 21, 2013 at 6:33
  • @Deck. Wow, can you please explain what this does? Im looking at a Cheat Sheet (from Regexlib dot com) and I still can't figure it out. Commented Nov 21, 2013 at 6:36
  • FYI, I don't find a complicated regex that is hard to understand exactly what it does without being a regex expert to be an "elegant" solution. Not sure what you meant when you said "elegant" in the question as that's a bit in the eye of the beholder, but keep in mind that a complicated single line of code is not always the best way to solve your problem. Commented Nov 21, 2013 at 6:37
  • Well, a regex being a standard thing, in my opinion it is elegant (even if I personally cant read it). In addition, any one liner is likely more elegant compared a function that is comparatively less efficient. Commented Nov 21, 2013 at 6:39

3 Answers 3

12

Short Answer:

A simple regex for this purpose would be:

/'[^']+'|[^\s]+/g

Sample code:

data = "This-is-first-token This-is-second-token 'This is third token'";
data.match(/'[^']+'|[^\s]+/g);

Result:

["This-is-first-token", "This-is-second-token", "'This is third token'"]

Explanation:

Regular expression visualization

Debuggex Demo

I think this is as simple as you can make it in just a regex.

The g at the end makes it a global match, so you get all three matches. Without it, you get only the first string.

\s matches all whitespace (basically, and tabs, in this instance). So, it would work even if there was a tab between This-is-first-token and This-is-second-token.

To match content in braces, use this:

data.match(/\{[^\}]+\}|[^\s]+/g);

Regular expression visualization

Debuggex Demo

Braces or single quotes:

data.match(/\{[^\}]+\}|'[^']+'|[^\s]+/g);

Regular expression visualization

Debuggex Demo

Sign up to request clarification or add additional context in comments.

9 Comments

Don't think you need the parens.
I would use [^\s]+ instead of [^ ]+
I am trying to test this in regexlib's retester.aspx.
If I use this '[^']+'|[^\s]+ then it appears to work. So what is the /g part of the regex for?
im using regexlib's regex testeer. when I add /g then I only get the third token string. if I exclude it then I get all 3.
|
3

You can use this split:

var string = "This-is-first-token This-is-second-token 'This is third token'";
var arr = string.split(/(?=(?:(?:[^']*'){2})*[^']*$)\s+/);
//=> ["This-is-first-token", "This-is-second-token", "'This is third token'"]

This assumes quotes are all balanced.

4 Comments

@jfriend00: Never claimed elegance but requirement warrants this kind of regex.
OP asked for an elegant solution.
Once again elegance is pretty subjective thing. Different programmers can claim different things elegant.
Don't mistake elegance for readability. Although to me this regex is somewhat cryptic, it is still far more elegant than a function full of string manipulations.
1

I came up with the following:

"This-is-first-token This-is-second-token 'This is third token'".match(/('[A-Za-z\s^-]+'|[A-Za-z\-]+)/g)
["This-is-first-token", "This-is-second-token", "'This is third token'"]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.