Javascript Regex - ignoring certain characters between 2 chars

Question

I have a need to split a string on space character (' ') but while excluding any spaces that come within 2 specific characters (say single quotes).

Here is an example string:

This-is-first-token This-is-second-token 'This is third token'

The output array should look like this:

[0] = This-is-first-token
[1] = This-is-second-token
[2] = 'This is third token'

Question: Can this be done elegantly with regular expression?

I am not sure that this is elegant but /[a-zA-Z-]+|['"][\sa-zA-Z-]+['"]/g — Deck
– Deck, Commented Nov 21, 2013 at 6:33
@Deck. Wow, can you please explain what this does? Im looking at a Cheat Sheet (from Regexlib dot com) and I still can't figure it out. — AlvinfromDiaspar
– AlvinfromDiaspar, Commented Nov 21, 2013 at 6:36
FYI, I don't find a complicated regex that is hard to understand exactly what it does without being a regex expert to be an "elegant" solution. Not sure what you meant when you said "elegant" in the question as that's a bit in the eye of the beholder, but keep in mind that a complicated single line of code is not always the best way to solve your problem. — jfriend00
– jfriend00, Commented Nov 21, 2013 at 6:37
Well, a regex being a standard thing, in my opinion it is elegant (even if I personally cant read it). In addition, any one liner is likely more elegant compared a function that is comparatively less efficient. — AlvinfromDiaspar
– AlvinfromDiaspar, Commented Nov 21, 2013 at 6:39

elixenide · Accepted Answer · 2015-10-30 18:52:43Z

12

Short Answer:

A simple regex for this purpose would be:

/'[^']+'|[^\s]+/g

Sample code:

data = "This-is-first-token This-is-second-token 'This is third token'";
data.match(/'[^']+'|[^\s]+/g);

Result:

["This-is-first-token", "This-is-second-token", "'This is third token'"]

Explanation:

Regular expression visualization

Debuggex Demo

I think this is as simple as you can make it in just a regex.

The g at the end makes it a global match, so you get all three matches. Without it, you get only the first string.

\s matches all whitespace (basically, and tabs, in this instance). So, it would work even if there was a tab between This-is-first-token and This-is-second-token.

To match content in braces, use this:

data.match(/\{[^\}]+\}|[^\s]+/g);

Regular expression visualization

Debuggex Demo

Braces or single quotes:

data.match(/\{[^\}]+\}|'[^']+'|[^\s]+/g);

Regular expression visualization

Debuggex Demo

edited Oct 30, 2015 at 18:52

answered Nov 21, 2013 at 6:37

elixenide

44.9k16 gold badges79 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

jfriend00 Over a year ago

Don't think you need the parens.

hwnd Over a year ago

I would use [^\s]+ instead of [^ ]+

AlvinfromDiaspar Over a year ago

I am trying to test this in regexlib's retester.aspx.

AlvinfromDiaspar Over a year ago

If I use this '[^']+'|[^\s]+ then it appears to work. So what is the /g part of the regex for?

AlvinfromDiaspar Over a year ago

im using regexlib's regex testeer. when I add /g then I only get the third token string. if I exclude it then I get all 3.

|

anubhava · Accepted Answer · 2013-11-21 06:34:53Z

3

You can use this split:

var string = "This-is-first-token This-is-second-token 'This is third token'";
var arr = string.split(/(?=(?:(?:[^']*'){2})*[^']*$)\s+/);
//=> ["This-is-first-token", "This-is-second-token", "'This is third token'"]

This assumes quotes are all balanced.

answered Nov 21, 2013 at 6:34

anubhava

790k67 gold badges603 silver badges671 bronze badges

4 Comments

anubhava Over a year ago

@jfriend00: Never claimed elegance but requirement warrants this kind of regex.

jfriend00 Over a year ago

OP asked for an elegant solution.

anubhava Over a year ago

Once again elegance is pretty subjective thing. Different programmers can claim different things elegant.

AlvinfromDiaspar Over a year ago

Don't mistake elegance for readability. Although to me this regex is somewhat cryptic, it is still far more elegant than a function full of string manipulations.

Rob M. · Accepted Answer · 2013-11-21 06:36:53Z

1

I came up with the following:

"This-is-first-token This-is-second-token 'This is third token'".match(/('[A-Za-z\s^-]+'|[A-Za-z\-]+)/g)
["This-is-first-token", "This-is-second-token", "'This is third token'"]

answered Nov 21, 2013 at 6:36

Rob M.

36.6k6 gold badges56 silver badges51 bronze badges

Collectives™ on Stack Overflow

Javascript Regex - ignoring certain characters between 2 chars

3 Answers 3

Short Answer:

Sample code:

Explanation:

9 Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Short Answer:

Sample code:

Explanation:

9 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related