I was investigating a preconception I had that sorting strings in javascript would be slower than sorting integers. This is based on something I read (and now can't find), which appears to be erroneous, which stated that javascript stores strings as Array<Array<int>> instead of just Array<int>. The MDN documentation seems to contradict this:
JavaScript's String type is used to represent textual data. It is a set of "elements" of 16-bit unsigned integer values. Each element in the String occupies a position in the String. The first element is at index 0, the next at index 1, and so on. The length of a String is the number of elements in it.
If we define the "size" of an element (number or string) as the length of its textual representation (so size = String(x).length for either a numeric element or a string element), then for a large array of elements of the same size (one numeric, and one strings), I was expecting the sorting of the strings to be equal to or slightly slower than sorting the arrays, but when I ran a simple test (code below), it turned out that the strings were about twice as fast to sort.
I want to know what it is about strings and numbers, and how javascript does its sorting, that makes string sorting faster than numeric sorting. Perhaps there is something I am misunderstanding.
Results:
~/sandbox > node strings-vs-ints.js 10000 16
Sorting 10000 numbers of magnitude 10^16
Sorting 10000 strings of length 16
Numbers: 18
Strings: 9
~/sandbox > node strings-vs-ints.js 1000000 16
Sorting 1000000 numbers of magnitude 10^16
Sorting 1000000 strings of length 16
Numbers: 3418
Strings: 1529
~/sandbox > node strings-vs-ints.js 1000000 32
Sorting 1000000 numbers of magnitude 10^32
Sorting 1000000 strings of length 32
Numbers: 3634
Strings: 1474
Source:
"use strict";
const CHARSET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghjijklmnopqrstuvwxyz0123456789:.";
function generateString(L) {
const chars = [];
while(chars.length < L) {
chars.push(CHARSET[Math.floor(Math.random() * CHARSET.length)]);
}
return chars.join("");
}
function generateNumber(L) {
return Math.floor(Math.random() * Math.pow(10, (L - 1))) + Math.pow(10, L - 1);
}
function generateList(generator, L, N) {
const elements = [];
while(elements.length < N) {
elements.push(generator.call(null, L));
}
return elements;
}
function now() {
return Date.now();
}
function getTime(baseTime) {
return now() - baseTime;
}
function main(count, size) {
console.log(`Sorting ${count} numbers of magnitude 10^${size}`);
const numbers = generateList(generateNumber, size, count);
const numBaseTime = now();
numbers.sort();
const numTime = getTime(numBaseTime);
console.log(`Sorting ${count} strings of length ${size}`);
const strings = generateList(generateString, size, count);
const strBaseTime = now();
strings.sort();
const strTime = getTime(strBaseTime);
console.log(`Numbers: ${numTime}\nStrings: ${strTime}`);
}
main(process.argv[2], process.argv[3]);
"500", "1000"vs.500, 1000. Former is swapped while latter is right. This is just my understanding of why it might be this way. It is may or may not be correct.10^32will be using scientific notation anyway for stringification. There's only 64 bits of entropy, you can't get an arbitrarily long string representation.sortfunction then it proceeds with the default sorting which coerces the values to be compared into strings (if they are not already a string type).