C language efficient array mask

Question

I have two 3 dimensional arrays of BOOL and I want to mask between them. I mean create third array: third[i][j][k] = first[i][j][k] && second[i][j][k], for each i,j,k.

I use c language (could be assembly)
I need that masking operation will be as fast as possible
Can assume that first and second have same size.
If it may improve performance I could possibly rearrange the data from arrays to other data arrangement.

Edited: Each array dimension is 100

Thank you!

Instead of bool arr[][][], I'd use a bitmask to store the data. Operating on a bitmask would be far more natural and almost certainly more efficient. — yan
– yan, Commented Aug 16, 2011 at 22:50
@yan could you post your comment as answer with a little more details? — Sergey Kucher
– Sergey Kucher, Commented Aug 16, 2011 at 22:57
I agree with yan. If instead of three dimensional arrays you packed the values into one dimensional arrays of 64-bit ints with 1 million bits each, your loop would be reduced to 15,625 iterations of: third[i] = first[i] & second[i]. Compare that to 1 million iterations of the boolean logic you used and you should have a vast performance improvement. — Carey Gregory
– Carey Gregory, Commented Aug 16, 2011 at 23:30
@Sergey, check below, I posted some (hopefully working) sample code. — yan
– yan, Commented Aug 17, 2011 at 16:10

yan · Accepted Answer · 2011-08-17 16:02:57Z

3

I mentioned this in a comment, but here's some working code (hopefully. I didn't test this nor did I feed it through a compiler. This is just for the idea). If you have a 100x100x100 array you're trying to model as bitmasks, then you can do the following:

// Create two bitmasks
const unsigned int BITS_PER_BYTE = 8;
const unsigned int DIM = 100;
const unsigned int BITS_PER_VALUE = BITS_PER_BYTE * sizeof(unsigned long);
const unsigned long MASK_SIZE = (DIM * DIM * DIM) / BITS_PER_VALUE;
unsigned long bitmask1[MASK_SIZE] = {0};
unsigned long bitmask2[MASK_SIZE] = {0};
unsigned long bitmask_result[MASK_SIZE];

// Set the two bitmasks, this is probably sub-optimal but you
// mention that setting bitmasks isn't supposed to be overly performant

// set bitmask1 (repeat something similar for bitmask2)
for (int i = 0; i < DIM; ++i)
  for (int j = 0; j < DIM; ++j)
    for (int k = 0; k < DIM; ++k) {
      // set bitmask[i][j][k] to 1
      unsigned int offset = DIM*DIM*i + DIM*j + k;
      unsigned int long_offset = offset / BITS_PER_VALUE;
      unsigned int bit_offset  = offset % BITS_PER_VALUE;
      // XXX SET THIS TO WHATEVER VALUE YOU HAVE, 1 FOR true and 0
      // FOR false. I'M SETTING EVERYTHING TO TRUE FOR THE SAKE OF
      // EXAMPLE
      bitmask1[long_offset] = 1 << bit_offset;
    }

// Now to actually compare:
for (int i = 0; i < MASK_SIZE; ++i) {
  bitmask_result[i] = bitmask1[i] & bitmask2[i];

// and that's it. bitmask_result will now have your answers. decompose
// the bitmask by doing the reverse of the above set loop

answered Aug 17, 2011 at 16:02

yan

21k3 gold badges41 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sergey Kucher Over a year ago

thank you for a good answer but my purpose is to optimise all the process

phuclv Over a year ago

@SergeyKucher to optimize it you'll need to avoid working on individual bits. Use a 1D array and operate on a bunch of bits at once, preferably with SIMD and multithreading to improve performance

Diego Sevilla · Accepted Answer · 2011-08-16 23:25:23Z

You know, arranging the data in memory so that all the calculations could be done in one (very optimized, SSE, etc.) loop would help. HOWEVER, take into account that you're accessing a lot of memory doing a very, very fast operation, so the optimization won't be much. AND, if you choose to rearrange the memory, the arranging process it would be maybe slower than the calculation itself.

Looking at this problem, it comes to my mind an article by Charles Petzold on the book "Beautiful Code". You could generate code patterns for each value of each line of the loop (100 different code patterns) that only generate an assignation if the corresponding bit value is 1, and then "junp" to the correct implementation depending on the bit value of the line you're processing. You would need to use bitfields for the different masks. You convert a 3 nested loop into a 2 nested loop with optimized code for the inner loop (not too bad), having to generate using some other utility (or just plain C/C++) the code itself for the different values of the inner loop. You should read the chapter to understand it. Really neat.

user1115652 · Accepted Answer · 2011-08-16 22:57:19Z

1

I'd say only profiling will answer your question, and I will not do that for you, but I would simply go with a for loop and only bother to really look further if that fails to perform.

Do not optimize prematurely.

answered Aug 16, 2011 at 22:57

user1115652

Collectives™ on Stack Overflow

C language efficient array mask

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related