1

I have two 3 dimensional arrays of BOOL and I want to mask between them. I mean create third array: third[i][j][k] = first[i][j][k] && second[i][j][k], for each i,j,k.

  1. I use c language (could be assembly)
  2. I need that masking operation will be as fast as possible
  3. Can assume that first and second have same size.
  4. If it may improve performance I could possibly rearrange the data from arrays to other data arrangement.

Edited: Each array dimension is 100

Thank you!

5
  • 6
    Instead of bool arr[][][], I'd use a bitmask to store the data. Operating on a bitmask would be far more natural and almost certainly more efficient. Commented Aug 16, 2011 at 22:50
  • @hexa no it is something from picture analisys Commented Aug 16, 2011 at 22:53
  • 1
    @yan could you post your comment as answer with a little more details? Commented Aug 16, 2011 at 22:57
  • 1
    I agree with yan. If instead of three dimensional arrays you packed the values into one dimensional arrays of 64-bit ints with 1 million bits each, your loop would be reduced to 15,625 iterations of: third[i] = first[i] & second[i]. Compare that to 1 million iterations of the boolean logic you used and you should have a vast performance improvement. Commented Aug 16, 2011 at 23:30
  • @Sergey, check below, I posted some (hopefully working) sample code. Commented Aug 17, 2011 at 16:10

3 Answers 3

3

I mentioned this in a comment, but here's some working code (hopefully. I didn't test this nor did I feed it through a compiler. This is just for the idea). If you have a 100x100x100 array you're trying to model as bitmasks, then you can do the following:

// Create two bitmasks
const unsigned int BITS_PER_BYTE = 8;
const unsigned int DIM = 100;
const unsigned int BITS_PER_VALUE = BITS_PER_BYTE * sizeof(unsigned long);
const unsigned long MASK_SIZE = (DIM * DIM * DIM) / BITS_PER_VALUE;
unsigned long bitmask1[MASK_SIZE] = {0};
unsigned long bitmask2[MASK_SIZE] = {0};
unsigned long bitmask_result[MASK_SIZE];

// Set the two bitmasks, this is probably sub-optimal but you
// mention that setting bitmasks isn't supposed to be overly performant

// set bitmask1 (repeat something similar for bitmask2)
for (int i = 0; i < DIM; ++i)
  for (int j = 0; j < DIM; ++j)
    for (int k = 0; k < DIM; ++k) {
      // set bitmask[i][j][k] to 1
      unsigned int offset = DIM*DIM*i + DIM*j + k;
      unsigned int long_offset = offset / BITS_PER_VALUE;
      unsigned int bit_offset  = offset % BITS_PER_VALUE;
      // XXX SET THIS TO WHATEVER VALUE YOU HAVE, 1 FOR true and 0
      // FOR false. I'M SETTING EVERYTHING TO TRUE FOR THE SAKE OF
      // EXAMPLE
      bitmask1[long_offset] = 1 << bit_offset;
    }

// Now to actually compare:
for (int i = 0; i < MASK_SIZE; ++i) {
  bitmask_result[i] = bitmask1[i] & bitmask2[i];

// and that's it. bitmask_result will now have your answers. decompose
// the bitmask by doing the reverse of the above set loop
Sign up to request clarification or add additional context in comments.

2 Comments

thank you for a good answer but my purpose is to optimise all the process
@SergeyKucher to optimize it you'll need to avoid working on individual bits. Use a 1D array and operate on a bunch of bits at once, preferably with SIMD and multithreading to improve performance
2

You know, arranging the data in memory so that all the calculations could be done in one (very optimized, SSE, etc.) loop would help. HOWEVER, take into account that you're accessing a lot of memory doing a very, very fast operation, so the optimization won't be much. AND, if you choose to rearrange the memory, the arranging process it would be maybe slower than the calculation itself.

Looking at this problem, it comes to my mind an article by Charles Petzold on the book "Beautiful Code". You could generate code patterns for each value of each line of the loop (100 different code patterns) that only generate an assignation if the corresponding bit value is 1, and then "junp" to the correct implementation depending on the bit value of the line you're processing. You would need to use bitfields for the different masks. You convert a 3 nested loop into a 2 nested loop with optimized code for the inner loop (not too bad), having to generate using some other utility (or just plain C/C++) the code itself for the different values of the inner loop. You should read the chapter to understand it. Really neat.

Comments

1

I'd say only profiling will answer your question, and I will not do that for you, but I would simply go with a for loop and only bother to really look further if that fails to perform.

Do not optimize prematurely.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.