25

I have a string like

 "asdf a  b c2 "

And I want to split it into an array like this:

["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

Using string.split(" ") removes the spaces, resulting in this:

["asdf", "a", "", "b", "c2"]

I thought of inserting extra delimiters, e.g.

string.replace(/ /g, "| |").replace(/||/g, "|").split("|");

But this gives an unexpected result.

0

5 Answers 5

23

Instead of splitting, it might be easier to think of this as extracting strings comprising either the delimiter or consecutive characters that are not the delimiter:

'asdf a  b c2 '.match(/\S+|\s/g)
// result: ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]
'asdf a  b. . c2% * '.match(/\S+|\s/g)
// result: ["asdf", " ", "a", " ", " ", "b.", " ", ".", " ", "c2%", " ", "*", " "]

A more Shakespearean definition of the matches would be:

'asdf a  b c2 '.match(/ |[^ ]+/g)

To or (not to )+.

Sign up to request clarification or add additional context in comments.

3 Comments

@Jack I hadn't, but that seems to work! Clearly, I need to learn regular expressions.. What does \S+ mean?
@gandalf3 \S is the opposite of \s .. it could also be written as [^\s].
+1 but note: wrapping it in a non-capturing group ((?: )) is not necessary. 'asdf a b c2 '.match(/\S+|\s/g) would be the same
10

Use positive lookahead:

"asdf a  b c2 ".split(/(?= )/)
// => ["asdf", " a", " ", " b", " c2", " "]

Post-edit EDIT: As I said in comments, the lack of lookbehind makes this a bit trickier. If all the words only consist of letters, you can fake lookbehind using \b word boundary matcher:

"asdf a  b c2 ".split(/(?= )|\b/)
// => ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

but as soon as you get some punctuation in, it breaks down, since it does not only break on spaces:

"asdf-eif.b".split(/(?= )|\b/)
// => ["asdf", "-", "eif", ".", "b"]

If you do have non-letters you don't want to break on, then I will also suggest a postprocessing method.

Post-think EDIT: This is based on JamesA's original idea, but refined to not use jQuery, and to correctly split:

function chop(str) {
  var result = [];
  var pastFirst = false;
  str.split(' ').forEach(function(x) {
    if (pastFirst) result.push(' ');
    if (x.length) result.push(x);
    pastFirst = true;
  });
  return result;
}
chop("asdf a  b c2 ")
// => ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

5 Comments

This works great for what I wrote in my question, but I just realized I made a mistake with the examples.. See my edited question.
@gandalf3 you want them not as strings?
@limelights I want each space to be in a single element. There should never be a space + anything else in one element.
@limelights: Originally the split was before each space; now it is before and after each space. Unfortunately, JavaScript does not have lookbehind, so this is a bit harder...
Thanks! This works great, but accepted Jack's answer because it's shorter (though that solution does split on any whitespace character, not just spaces. But it's fine for my case). I would accept both if I could.. (+1 btw)
8

I'm surprised no one has mentioned this yet, but I'll post this here for the sake of completeness. If you have capturing groups in your expression, then .split will include the captured substring as a separate entry in the result array:

"asdf a  b c2 ".split(/( )/)  // or /(\s)/
// ["asdf", " ", "a", " ", "", " ", "b", " ", "c2", " ", ""]

Note, this is not exactly the same as the desired output you specified, as it includes an empty string between the two contiguous spaces and after the last space.

If necessary, you can filter out all empty strings from the result array like this:

"asdf a  b c2 ".split(/( )/).filter(String)
// ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

However, if this is what you're looking for, I'd probably recommend you go with @Jack's solution.

2 Comments

Oops, sorry.. The empty string at the end was the typo. I've edited my question.
@gandalf3 Okay, I've included an alternate solution that will get you the desired result in that case.
0

You could use a little jQuery

var toSplit = "asdf a  b c2 ".split(" ");
$.each(toSplit, 
    function(index, value) { 
        if (toSplit[index] == '') { toSplit[index] = ' '} 
    }
);

This will create the output you are looking for without the leading spaces on the other elements.

1 Comment

No need for jQuery in newer environments - jQuery.each is a poor man's [].foreach.
0

Try clean-split:

const cleanSplit = require("clean-split");

cleanSplit("a-b-c", "-");
//=> ["a", "-", "b", "-", "c"]

cleanSplit("a-b-c", "-", { anchor: "before" });
//=> ["a-", "b-", "c"]

cleanSplit("a-b-c", "-", { anchor: "after" });
//=> ["a", "-b", "-c"]

Under the hood, it uses logic adapted from:

In your case, you can do something like this:

const cleanSplit = require("clean-split");

cleanSplit("asdf a  b c2 ", " ");
//=> ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.