Understanding subsets of array

Question

You can see here (Find all possible subset combos in an array?) , that one solution presented is

var sets = (function(input, size){
    var results = [], result, mask, i, total = Math.pow(2, input.length);
    for(mask = size; mask < total; mask++){
        result = [];
        i = input.length - 1; 
        do{
            if( (mask & (1 << i)) !== 0){
            result.push(input[i]);
        }
        }while(i--);
        if( result.length >= size){
        results.push(result);
        }
    }   

return results; 
})(['a','b','c','d','e','f'], 2);
console.log(sets);

Firstly I would like to know exactly what size, and mask are doing. I.e why do we need a second param for our function. Also I did soe research on bitwise operators but am still not sure what and why we are doing

mask & (1 << i)) !== 0

What is this implying?

Why haven't you asked this in the referred question, say as a comment to the answer? — VisioN
– VisioN, Commented Jul 14, 2016 at 13:17
Its from 2013, and due to large text I will have to write to understand, a new question is much more appropriate. — Motombo
– Motombo, Commented Jul 14, 2016 at 13:21
mask & (1 << i) checks if the i-the bit of mask (interpreted as a signed 32-bit integer) is set to 1. — le_m
– le_m, Commented Jul 14, 2016 at 13:44

trincot · Accepted Answer · 2016-07-14 16:24:44Z

3

The size parameter is needed because the original question was to produce subsets of a minimal size (of 2). The parameter allows you to specify that minimal size. This is nice, because you can then use the same code to get all subsets with a minimal size of 3.

The variable mask takes all integer values that can be made with input.length bits. So, represented in binary, mask takes these values:

Every 1 bit signals that the corresponding element from the input array should be part of the subset. So the above values of mask represent the following subsets:

[]
['a']
['b']
['a','b']
['c']
... etc ...
['c','d','e','f']
['a','c','d','e','f']
['b','c','d','e','f']
['a','b','c','d','e','f']

It is a bit confusing because we write binary representations with the least-significant bit on the right, while arrays are shown with the 0^th element on the left.

From those sub arrays, only those with minimal size elements are retained in the results. So in the case of size = 2, the first 3 are not retained: the first one that stays is ['a','b'].

For that reason the algorithm was a bit improved, and mask does not start from 0, because it already is sure not to find a large enough subset with mask smaller or equal to size. So that is why the for loop starts with that value. But it is not that important. It would also work starting with zero:

for (mask = 0; mask < total; mask++)

Then concerning the if: it tests whether a certain bit is set in the binary representation of mask: for that a single bit (1) is shifted to the left (so it becomes something like 10000 in binary representation) and then it is AND-ed (with &) with mask. For instance, if mask is at a certain moment 101001 and the shifted bit is 10000 then this & operation works like this:

mask: 101001
bit:   10000
------------
AND:  000000

Another example:

mask: 101001
bit:    1000
------------
AND:  001000

So, the if condition tests whether the bit at position i in mask (starting at the right of the binary representation) is set or not. If it is set, the value mask & (1 << i)) will be equal to (1 << i) (like for instance 1000), if it is not set this expression evaluates to 0. So by doing the !== 0 test you effectively test that the bit is set. Only in that case the corresponding element from the input array is added to the subset.

Example run

Let's say we call the function with these arguments:

(['a','b','c'], 2);

Then total = 8, and mask runs from 2 to 7. Put in binary terms, mask takes the following binary values:

Note that every bit ("column") in the above table corresponds to an element of the array, and its value (0 or 1) decides whether that element should be in the sub array.

Other values of mask are not interesting: smaller values (like binary 001 or 000) represent sub-arrays that have too few elements -- we need at least two. Higher values (like 1000, 1001, ...) have too many bits: the left-most bit does not correspond to any element in the input array, since we only have 3 elements in the array. So that means we have all the interesting values for mask in the above table.

The next thing that happens in the code, is finding the 1-bits in a particular value of mask. We use the variable i to denote the 0-based number of the bit (its position starting from the right of the binary representation). This i will start at 2 and decrease until 0 (so 2, 1, and 0: each pointing to one of the three bits in mask):

i = input.length - 1; 
do { ... } while (i--);

For a given bit-number i, the bit's value in mask is extracted with:

mask & (1 << i)

So let's do all of the above with mask = 2 (i.e. 010 in binary). These are the values that are calculated:

result = []
i = 2
(1 << i) == 4 // 100 in binary: a 1-bit shifted twice to the left
mask & (1 << i) == 0 // because mask does not have that bit set

The next iteration for i:

i = 1
(1 << i) == 2 // 010 in binary: a 1-bit shifted once to the left
mask & (1 << i) == 2 // because mask has that bit set 
result = ['b'] // because 'b' is the value at index i.

The last iteration for i:

i = 0
(1 << i) == 1 // 001 in binary: a 1-bit not shifted at all
mask & (1 << i) == 0 // because mask does not have that bit set

Now, the length of result is checked, but for this particular value of mask it is too small:

result.length == 1 // see above, so nothing is added to the *results* (with 's')

All has been done for mask == 2, so now the above is repeated for mask == 3 (binary 011):

result = []
i = 2
(1 << i) == 4 // 100 in binary: a 1-bit shifted twice to the left
mask & (1 << i) == 0 // because mask does not have that bit set

The next iteration for i:

i = 1
(1 << i) == 2 // 010 in binary: a 1-bit shifted once to the left
mask & (1 << i) == 2 // because mask has that bit set 
result = ['b'] // because 'b' is the value at index i.

The last iteration for i (different result here):

i = 0
(1 << i) == 1 // 001 in binary: a 1-bit not shifted at all
mask & (1 << i) == 1 // because mask has that bit set as well (different!)
result = ['b','a'] // because 'a' is the value at index i.

Now, the length of result is checked again, and this time it is big enough:

result.length == 2 // see above
results = [['b','a']]

...etc.

Note that results (plural) is an array of arrays, so eventually it will grow to:

results = [['b','a'], ['c','a'], ['c','b'], ['c','b','a']]

The sub arrays have their elements in reversed order because the loop of i iterates backward. The sub arrays would be ordered normally if i would increment from 0 onwards (which is possible without any issue).

Limitations

Because bit operations like << and & require their arguments to be 32 bit integers, the code at hand will only work for input arrays that have at the most 32 elements.

edited Jul 14, 2016 at 16:24

answered Jul 14, 2016 at 13:46

trincot

357k38 gold badges282 silver badges338 bronze badges

Sign up to request clarification or add additional context in comments.

57 Comments

Motombo Over a year ago

Is there a reason total = 2^(input.length), im guessing this is due to binary as well

Motombo Over a year ago

Perhaps you could run me through a test case. Say we call (['a','b','c'],2). Now total = 2^(3). So total is 8. Now mask = 2, so we keep looping until mask is 8. i becomes 7... er now what after reading your answer im still a tad bit confused

trincot Over a year ago

I extended my answer. I hope this is detailed enough to get the grips on it.

Motombo Over a year ago

Thanks, could you expand on more on mask & (1 << i). Are these kind of like logic gates? I remember taking a class last semester like that. It says "a<<b shifts in a binary representatin b to the left,shifting in zeroes from the right. " What does that mean here?

trincot Over a year ago

1<<5 is the number 1 (binary 000000000001) shifted 5 times to the left, adding zeroes to the right: binary 000000100000. As you see, 5 zeroes were added to the right. This is equal to 2^5. With i: 1<<i is 2^i.

|

Collectives™ on Stack Overflow

Understanding subsets of array

1 Answer 1

Example run

Limitations

57 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Example run

Limitations

57 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related