331

I would like to split a very large string (let's say, 10,000 characters) into N-size chunks.

What would be the best way in terms of performance to do this?

For instance: "1234567890" split by 2 would become ["12", "34", "56", "78", "90"].

Would something like this be possible using String.prototype.match and if so, would that be the best way to do it in terms of performance?

23 Answers 23

665

You can do something like this:

"1234567890".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "90"]

The method will still work with strings whose size is not an exact multiple of the chunk-size:

"123456789".match(/.{1,2}/g);
// Results in:
["12", "34", "56", "78", "9"]

In general, for any string out of which you want to extract at-most n-sized substrings, you would do:

str.match(/.{1,n}/g); // Replace n with the size of the substring

If your string can contain newlines or carriage returns, you would do:

str.match(/(.|[\r\n]){1,n}/g); // Replace n with the size of the substring

As far as performance, I tried this out with approximately 10k characters and it took a little over a second on Chrome. YMMV.

This can also be used in a reusable function:

function chunkString(str, length) {
  return str.match(new RegExp('.{1,' + length + '}', 'g'));
}
Sign up to request clarification or add additional context in comments.

10 Comments

As this answer is now nearly 3 years old, I wanted to try the performance test made by @Vivin again. So FYI, splitting 100k characters two by two using the given regex is instantaneous on Chrome v33.
@Fmstrat What do you mean by "if your string contains spaces, it does not count in the length"? Yes, . does not match newline at all. I will update the answer so that it takes \n and \r into account.
Something like var chunks = str.split("").reverse().join().match(/.{1, 4}/).map(function(s) { return s.split("").reverse().join(); });. This does it in chunks of 4. I am not sure what you mean by "less or more". Keep in mind this won't work in general, especially with strings that contain combining characters and can break Unicode strings as well.
According to developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/… you can match any character, including new lines, with [^]. With this your example would result in str.match(/[^]{1,n}/g)
For anyone looking for really fast string chunking with performance benchmarks on jsperf, see my answer. Using a regex is the slowest chunking method of all.
|
79

I created several faster variants which you can see on jsPerf. My favorite one is this:

function chunkSubstr(str, size) {
  const numChunks = Math.ceil(str.length / size)
  const chunks = new Array(numChunks)

  for (let i = 0, o = 0; i < numChunks; ++i, o += size) {
    chunks[i] = str.substr(o, size)
  }

  return chunks
}

3 Comments

so this worked fabulously on long strings (circa 800k - 9m chars) except when I set the size to 20 for some reason the last chunk was not returned... very weird behaviour.
@DavidAnderton Good catch. I fixed it and interestingly it seems to run even faster. It was rounding when it should have been doing Math.ceil() to determine the correct number of chunks.
Thanks! I put his together as an NPM module with optional Unicode support - github.com/vladgolubev/fast-chunk-string
47

Bottom line:

  • match is very inefficient, slice is better, on Firefox substr/substring is better still
  • match is even more inefficient for short strings (even with cached regex - probably due to regex parsing setup time)
  • match is even more inefficient for large chunk size (probably due to inability to "jump")
  • for longer strings with very small chunk size, match outperforms slice on older IE but still loses on all other systems
  • jsperf rocks

1 Comment

jsperf links are broken
31

This is a fast and straightforward solution -

function chunkString (str, len) {
  const size = Math.ceil(str.length/len)
  const r = Array(size)
  let offset = 0
  
  for (let i = 0; i < size; i++) {
    r[i] = str.substr(offset, len)
    offset += len
  }
  
  return r
}

console.log(chunkString("helloworld", 3))
// => [ "hel", "low", "orl", "d" ]

// 10,000 char string
const bigString = "helloworld".repeat(1000)
console.time("perf")
const result = chunkString(bigString, 3)
console.timeEnd("perf")
console.log(result)
// => perf: 0.385 ms
// => [ "hel", "low", "orl", "dhe", "llo", "wor", ... ]

4 Comments

You have to use substr() instead of substring().
I'm curious, why the underscores in the variable names?
@FelipeValdes I assume to not confuse them with global/parameter variables or to denote them as privately-scoped.
@Leif substr() is now deprecated in favor of substring() developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/…
25

Surprise! You can use split to split.

var parts = "1234567890 ".split(/(.{2})/).filter(O=>O)

Results in [ '12', '34', '56', '78', '90', ' ' ]

5 Comments

What is filter (o=>o) for?
current regex creates empty array elements between chunks. filter(x=>x) is used to filter out those empty elements
Short and clever but iterates over the input multiple times. This answer is more than 4x slower than other solutions in this thread.
@BenCarp It's the motorcycle operator. It makes it go faster. ;)
.filter(Boolean) will do the trick
15

You can definitely do something like

let pieces = "1234567890 ".split(/(.{2})/).filter(x => x.length == 2);

to get this:

[ '12', '34', '56', '78', '90' ]

If you want to dynamically input/adjust the chunk size so that the chunks are of size n, you can do this:

n = 2;
let pieces = "1234567890 ".split(new RegExp("(.{"+n.toString()+"})")).filter(x => x.length == n);

To find all possible size n chunks in the original string, try this:

let subs = new Set();
let n = 2;
let str = "1234567890 ";
let regex = new RegExp("(.{"+n.toString()+"})");     //set up regex expression dynamically encoded with n

for (let i = 0; i < n; i++){               //starting from all possible offsets from position 0 in the string
    let pieces = str.split(regex).filter(x => x.length == n);    //divide the string into chunks of size n...
    for (let p of pieces)                 //...and add the chunks to the set
        subs.add(p);
    str = str.substr(1);    //shift the string reading frame
}

You should end up with:

[ '12', '23', '34', '45', '56', '67', '78', '89', '90', '0 ' ]

Comments

8
var str = "123456789";
var chunks = [];
var chunkSize = 2;

while (str) {
    if (str.length < chunkSize) {
        chunks.push(str);
        break;
    }
    else {
        chunks.push(str.substr(0, chunkSize));
        str = str.substr(chunkSize);
    }
}

alert(chunks); // chunks == 12,34,56,78,9

Comments

6

Include both left and right version with pre-allocation. This is as fast as RegExp impl for small chunks but it goes faster as the chunk size grows. And it is memory efficent.

function chunkLeft (str, size = 3) {
  if (typeof str === 'string') {
    const length = str.length
    const chunks = Array(Math.ceil(length / size))
    for (let i = 0, index = 0; index < length; i++) {
      chunks[i] = str.slice(index, index += size)
    }
    return chunks
  }
}

function chunkRight (str, size = 3) {
  if (typeof str === 'string') {
    const length = str.length
    const chunks = Array(Math.ceil(length / size))
    if (length) {
      chunks[0] = str.slice(0, length % size || size)
      for (let i = 1, index = chunks[0].length; index < length; i++) {
        chunks[i] = str.slice(index, index += size)
      }
    }
    return chunks
  }
}

console.log(chunkRight())  // undefined
console.log(chunkRight(''))  // []
console.log(chunkRight('1'))  // ["1"]
console.log(chunkRight('123'))  // ["123"]
console.log(chunkRight('1234'))  // ["1", "234"]
console.log(chunkRight('12345'))  // ["12", "345"]
console.log(chunkRight('123456'))  // ["123", "456"]
console.log(chunkRight('1234567'))  // ["1", "234", "567"]

1 Comment

p.s. I found slice is a little bit faster than substr
5

I have written an extended function, so the chunk length can also be an array of numbers, like [1,3]

String.prototype.chunkString = function(len) {
    var _ret;
    if (this.length < 1) {
        return [];
    }
    if (typeof len === 'number' && len > 0) {
        var _size = Math.ceil(this.length / len), _offset = 0;
        _ret = new Array(_size);
        for (var _i = 0; _i < _size; _i++) {
            _ret[_i] = this.substring(_offset, _offset = _offset + len);
        }
    }
    else if (typeof len === 'object' && len.length) {
        var n = 0, l = this.length, chunk, that = this;
        _ret = [];
        do {
            len.forEach(function(o) {
                chunk = that.substring(n, n + o);
                if (chunk !== '') {
                    _ret.push(chunk);
                    n += chunk.length;
                }
            });
            if (n === 0) {
                return undefined; // prevent an endless loop when len = [0]
            }
        } while (n < l);
    }
    return _ret;
};

The code

"1234567890123".chunkString([1,3])

will return:

[ '1', '234', '5', '678', '9', '012', '3' ]

1 Comment

Why call it chunkString when it's an instance member? What else could it possibly be chunking?
4
const getChunksFromString = (str, chunkSize) => {
    var regexChunk = new RegExp(`.{1,${chunkSize}}`, 'g')   // '.' represents any character
    return str.match(regexChunk)
}

Call it as needed

console.log(getChunksFromString("Hello world", 3))   // ["Hel", "lo ", "wor", "ld"]

4 Comments

Be careful: this solution (and all the variants in this topic) removes \n characters from the string. getChunksFromString('1\n2\n3\n4', 2) does output [ "1", "2", "3", "4" ], without newline characters.
@JosdeJong if you need to match new line simply add an s flag new RegExp("...", "gs")
Maybe good to update your answer accordingly?
I think my answer is too similar to that of @Hari. Did not notice it. Will remove mine. Hari, please consider adding an example for split result. And see comment by Jos de Jong.
3

it Split's large string in to Small strings of given words .

function chunkSubstr(str, words) {
  var parts = str.split(" ") , values = [] , i = 0 , tmpVar = "";
  $.each(parts, function(index, value) {
      if(tmpVar.length < words){
          tmpVar += " " + value;
      }else{
          values[i] = tmpVar.replace(/\s+/g, " ");
          i++;
          tmpVar = value;
      }
  });
  if(values.length < 1 &&  parts.length > 0){
      values[0] = tmpVar;
  }
  return values;
}

Comments

2
var l = str.length, lc = 0, chunks = [], c = 0, chunkSize = 2;
for (; lc < l; c++) {
  chunks[c] = str.slice(lc, lc += chunkSize);
}

Comments

2

I would use a regex...

var chunkStr = function(str, chunkLength) {
    return str.match(new RegExp('[\\s\\S]{1,' + +chunkLength + '}', 'g'));
}

Comments

2

Here's a solution I came up with for template strings after a little experimenting:

Usage:

chunkString(5)`testing123`

function chunkString(nSize) {
    return (strToChunk) => {
        let result = [];
        let chars = String(strToChunk).split('');

        for(let i = 0; i < (String(strToChunk).length / nSize); i++) {
            result = result.concat(chars.slice(i*nSize,(i+1)*nSize).join(''));
        }
        return result
    }
}

document.write(chunkString(5)`testing123`);
// returns: testi,ng123

document.write(chunkString(3)`testing123`);
// returns: tes,tin,g12,3

Comments

1

What about this small piece of code:

function splitME(str, size) {
    let subStr = new RegExp('.{1,' + size + '}', 'g');
    return str.match(subStr);
};

Comments

1

You can use reduce() without any regex:

(str, n) => {
  return str.split('').reduce(
    (acc, rec, index) => {
      return ((index % n) || !(index)) ? acc.concat(rec) : acc.concat(',', rec)
    },
    ''
  ).split(',')
}

1 Comment

I think it'll help a lot if you would provide examples on how to use your reduce method.
0

In the form of a prototype function:

String.prototype.lsplit = function(){
    return this.match(new RegExp('.{1,'+ ((arguments.length==1)?(isFinite(String(arguments[0]).trim())?arguments[0]:false):1) +'}', 'g'));
}

Comments

0

Here is the code that I am using, it uses String.prototype.slice.

Yes it is quite long as an answer goes as it tries to follow current standards as close as possible and of course contains a reasonable amount of JSDOC comments. However, once minified, the code is only 828 bytes and once gzipped for transmission it is only 497 bytes.

The 1 method that this adds to String.prototype (using Object.defineProperty where available) is:

  1. toChunks

A number of tests have been included to check the functionality.

Worried that the length of code will affect the performance? No need to worry, http://jsperf.com/chunk-string/3

Much of the extra code is there to be sure that the code will respond the same across multiple javascript environments.

/*jslint maxlen:80, browser:true, devel:true */

/*
 * Properties used by toChunks.
 */

/*property
    MAX_SAFE_INTEGER, abs, ceil, configurable, defineProperty, enumerable,
    floor, length, max, min, pow, prototype, slice, toChunks, value,
    writable
*/

/*
 * Properties used in the testing of toChunks implimentation.
 */

/*property
    appendChild, createTextNode, floor, fromCharCode, getElementById, length,
    log, pow, push, random, toChunks
*/

(function () {
    'use strict';

    var MAX_SAFE_INTEGER = Number.MAX_SAFE_INTEGER || Math.pow(2, 53) - 1;

    /**
     * Defines a new property directly on an object, or modifies an existing
     * property on an object, and returns the object.
     *
     * @private
     * @function
     * @param {Object} object
     * @param {string} property
     * @param {Object} descriptor
     * @return {Object}
     * @see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/defineProperty
     */
    function $defineProperty(object, property, descriptor) {
        if (Object.defineProperty) {
            Object.defineProperty(object, property, descriptor);
        } else {
            object[property] = descriptor.value;
        }

        return object;
    }

    /**
     * Returns true if the operands are strictly equal with no type conversion.
     *
     * @private
     * @function
     * @param {*} a
     * @param {*} b
     * @return {boolean}
     * @see http://www.ecma-international.org/ecma-262/5.1/#sec-11.9.4
     */
    function $strictEqual(a, b) {
        return a === b;
    }

    /**
     * Returns true if the operand inputArg is undefined.
     *
     * @private
     * @function
     * @param {*} inputArg
     * @return {boolean}
     */
    function $isUndefined(inputArg) {
        return $strictEqual(typeof inputArg, 'undefined');
    }

    /**
     * The abstract operation throws an error if its argument is a value that
     * cannot be converted to an Object, otherwise returns the argument.
     *
     * @private
     * @function
     * @param {*} inputArg The object to be tested.
     * @throws {TypeError} If inputArg is null or undefined.
     * @return {*} The inputArg if coercible.
     * @see https://people.mozilla.org/~jorendorff/es6-draft.html#sec-requireobjectcoercible
     */
    function $requireObjectCoercible(inputArg) {
        var errStr;

        if (inputArg === null || $isUndefined(inputArg)) {
            errStr = 'Cannot convert argument to object: ' + inputArg;
            throw new TypeError(errStr);
        }

        return inputArg;
    }

    /**
     * The abstract operation converts its argument to a value of type string
     *
     * @private
     * @function
     * @param {*} inputArg
     * @return {string}
     * @see https://people.mozilla.org/~jorendorff/es6-draft.html#sec-tostring
     */
    function $toString(inputArg) {
        var type,
            val;

        if (inputArg === null) {
            val = 'null';
        } else {
            type = typeof inputArg;
            if (type === 'string') {
                val = inputArg;
            } else if (type === 'undefined') {
                val = type;
            } else {
                if (type === 'symbol') {
                    throw new TypeError('Cannot convert symbol to string');
                }

                val = String(inputArg);
            }
        }

        return val;
    }

    /**
     * Returns a string only if the arguments is coercible otherwise throws an
     * error.
     *
     * @private
     * @function
     * @param {*} inputArg
     * @throws {TypeError} If inputArg is null or undefined.
     * @return {string}
     */
    function $onlyCoercibleToString(inputArg) {
        return $toString($requireObjectCoercible(inputArg));
    }

    /**
     * The function evaluates the passed value and converts it to an integer.
     *
     * @private
     * @function
     * @param {*} inputArg The object to be converted to an integer.
     * @return {number} If the target value is NaN, null or undefined, 0 is
     *                   returned. If the target value is false, 0 is returned
     *                   and if true, 1 is returned.
     * @see http://www.ecma-international.org/ecma-262/5.1/#sec-9.4
     */
    function $toInteger(inputArg) {
        var number = +inputArg,
            val = 0;

        if ($strictEqual(number, number)) {
            if (!number || number === Infinity || number === -Infinity) {
                val = number;
            } else {
                val = (number > 0 || -1) * Math.floor(Math.abs(number));
            }
        }

        return val;
    }

    /**
     * The abstract operation ToLength converts its argument to an integer
     * suitable for use as the length of an array-like object.
     *
     * @private
     * @function
     * @param {*} inputArg The object to be converted to a length.
     * @return {number} If len <= +0 then +0 else if len is +INFINITY then
     *                   2^53-1 else min(len, 2^53-1).
     * @see https://people.mozilla.org/~jorendorff/es6-draft.html#sec-tolength
     */
    function $toLength(inputArg) {
        return Math.min(Math.max($toInteger(inputArg), 0), MAX_SAFE_INTEGER);
    }

    if (!String.prototype.toChunks) {
        /**
         * This method chunks a string into an array of strings of a specified
         * chunk size.
         *
         * @function
         * @this {string} The string to be chunked.
         * @param {Number} chunkSize The size of the chunks that the string will
         *                           be chunked into.
         * @returns {Array} Returns an array of the chunked string.
         */
        $defineProperty(String.prototype, 'toChunks', {
            enumerable: false,
            configurable: true,
            writable: true,
            value: function (chunkSize) {
                var str = $onlyCoercibleToString(this),
                    chunkLength = $toInteger(chunkSize),
                    chunked = [],
                    numChunks,
                    length,
                    index,
                    start,
                    end;

                if (chunkLength < 1) {
                    return chunked;
                }

                length = $toLength(str.length);
                numChunks = Math.ceil(length / chunkLength);
                index = 0;
                start = 0;
                end = chunkLength;
                chunked.length = numChunks;
                while (index < numChunks) {
                    chunked[index] = str.slice(start, end);
                    start = end;
                    end += chunkLength;
                    index += 1;
                }

                return chunked;
            }
        });
    }
}());

/*
 * Some tests
 */

(function () {
    'use strict';

    var pre = document.getElementById('out'),
        chunkSizes = [],
        maxChunkSize = 512,
        testString = '',
        maxTestString = 100000,
        chunkSize = 0,
        index = 1;

    while (chunkSize < maxChunkSize) {
        chunkSize = Math.pow(2, index);
        chunkSizes.push(chunkSize);
        index += 1;
    }

    index = 0;
    while (index < maxTestString) {
        testString += String.fromCharCode(Math.floor(Math.random() * 95) + 32);
        index += 1;
    }

    function log(result) {
        pre.appendChild(document.createTextNode(result + '\n'));
    }

    function test() {
        var strLength = testString.length,
            czLength = chunkSizes.length,
            czIndex = 0,
            czValue,
            result,
            numChunks,
            pass;

        while (czIndex < czLength) {
            czValue = chunkSizes[czIndex];
            numChunks = Math.ceil(strLength / czValue);
            result = testString.toChunks(czValue);
            czIndex += 1;
            log('chunksize: ' + czValue);
            log(' Number of chunks:');
            log('  Calculated: ' + numChunks);
            log('  Actual:' + result.length);
            pass = result.length === numChunks;
            log(' First chunk size: ' + result[0].length);
            pass = pass && result[0].length === czValue;
            log(' Passed: ' + pass);
            log('');
        }
    }

    test();
    log('');
    log('Simple test result');
    log('abcdefghijklmnopqrstuvwxyz'.toChunks(3));
}());
<pre id="out"></pre>

Comments

0
    window.format = function(b, a) {
        if (!b || isNaN(+a)) return a;
        var a = b.charAt(0) == "-" ? -a : +a,
            j = a < 0 ? a = -a : 0,
            e = b.match(/[^\d\-\+#]/g),
            h = e && e[e.length - 1] || ".",
            e = e && e[1] && e[0] || ",",
            b = b.split(h),
            a = a.toFixed(b[1] && b[1].length),
            a = +a + "",
            d = b[1] && b[1].lastIndexOf("0"),
            c = a.split(".");
        if (!c[1] || c[1] && c[1].length <= d) a = (+a).toFixed(d + 1);
        d = b[0].split(e);
        b[0] = d.join("");
        var f = b[0] && b[0].indexOf("0");
        if (f > -1)
            for (; c[0].length < b[0].length - f;) c[0] = "0" + c[0];
        else +c[0] == 0 && (c[0] = "");
        a = a.split(".");
        a[0] = c[0];
        if (c = d[1] && d[d.length -
                1].length) {
            for (var d = a[0], f = "", k = d.length % c, g = 0, i = d.length; g < i; g++) f += d.charAt(g), !((g - k + 1) % c) && g < i - c && (f += e);
            a[0] = f
        }
        a[1] = b[1] && a[1] ? h + a[1] : "";
        return (j ? "-" : "") + a[0] + a[1]
    };

var str="1234567890";
var formatstr=format( "##,###.", str);
alert(formatstr);


This will split the string in reverse order with comma separated after 3 char's. If you want you can change the position.

Comments

0

Using slice() method:

function returnChunksArray(str, chunkSize) {
  var arr = [];
  while(str !== '') {
    arr.push(str.slice(0, chunkSize));
    str = str.slice(chunkSize);
  }
  return arr;
}

The same can be done using substring() method.

function returnChunksArray(str, chunkSize) {
  var arr = [];
  while(str !== '') {
    arr.push(str.substring(0, chunkSize));
    str = str.substring(chunkSize);
  }
  return arr;
}

1 Comment

this does some relatively expensive array memory read/write due to the use of push(), slice(), substring(). @Justin Warkentin's answer is a bit more efficient while maintaining same level of readability as this solution.
0

My issue with the above solution is that it beark the string into formal size chunks regardless of the position in the sentences.

I think the following a better approach; although it needs some performance tweaking:

 static chunkString(str, length, size,delimiter='\n' ) {
        const result = [];
        for (let i = 0; i < str.length; i++) {
            const lastIndex = _.lastIndexOf(str, delimiter,size + i);
            result.push(str.substr(i, lastIndex - i));
            i = lastIndex;
        }
        return result;
    }

Comments

-1

Use this npm library "chkchars" but remember to make sure the length of the string given is perfectly divided by the "number" parameter.

const phrase = "1110010111010011100101110100010000011100101110100111001011101001011101001110010111010001000001110010111010011100101110100"
const number = 7

chkchars.splitToChunks(phrase, number)

// result => ['1110010', '1110100','1110010', '1110100','0100000', '1110010','1110100', '1110010','1110100', '1011101','0011100', '1011101','0001000','0011100','1011101', '0011100','1011101']

// perf => 0.287ms

Comments

-3
function chunkString(str, length = 10) {
    let result = [],
        offset = 0;
    if (str.length <= length) return result.push(str) && result;
    while (offset < str.length) {
        result.push(str.substr(offset, length));
        offset += length;
    }
    return result;
}

1 Comment

Your answer doesn't add anything new (compared to the other answers) and lacks any description like the other answers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.