0

I have a struct that has several arrays members:

typedef myData someStruct {
    uint16_t array1 [ARRAY_LENGTH]
    uint16_t array2 [ARRAY_LENGTH]
} myData;
myData testData = {0};  // Global struct

At some point in my program I need to set the arrays to some set of predefined values, e.g., set array1 to all 0, array2 to all 0xFF, etc. My first instinct was to write out a for loop something like:

void someFunction (myData * test) {
    for (uint16_t i = 0; i < ARRAY_LENGTH; ++i) {
        test->array1[i] = 0xFF;
        test->array2[i] = 0xCC;
    }
}

However I then reasoned that the actions required by the program to do this would go something like:

load address of array1 first position
set value 0xFF;
load far address of array2 first postion
set value 0xCC;
load far address of array1 second position
set value 0xFF;
// and so on...

Whereas if I used a separate loop for each array the addresses would be a lot nearer each other (as arrays and structs stored contiguously), so the address loads are only to the next byte each time, making the code actually more efficient as follows:

void someFunction (myData * test) {
    uint16_t i = 0;
    for (i; i < ARRAY_LENGTH; ++i)
        test->array1[i] = 0xFF;
    for (i = 0; i < ARRAY_LENGTH; ++i)
        test->array2[i] = 0xCC;
}

Is my reasoning correct, is the second one better? Furthermore, would a compiler (say gcc, for e.g.) normally be able to make this optimization itself?

13
  • I assume the array is on the process stack Commented Jan 9, 2014 at 12:21
  • 1
    why don't perfer memset like this memset(&mys.a,0xff,sizeof(mys.a)); memset(&mys.b,0xcc,sizeof(mys.b)); Commented Jan 9, 2014 at 12:25
  • 1
    I mean: Is the following statement correct: "the address loads are only to the next byte each time, making the code actually more efficient as follows"? [I actually don't know the answer... I would assume that reading a value of the memory from a specific address should take the same amount of time independent of the address value] Commented Jan 9, 2014 at 12:40
  • 1
    @kripanand memset would set each byte. He needs to set words. Commented Jan 9, 2014 at 13:00
  • 1
    @Toby, so why can't you just initialize them correctly from the start? Commented Jan 9, 2014 at 16:17

2 Answers 2

2

It's going to depend on your system architecture. For example, on, say, a SPARC system, the cache line size is 64-bytes, and there are enough cache slots for both arrays, so the first version would be efficient. The load of the first array element would populate the cache, and subsequent loads would be very fast. If the compiler is smart enough, it can use prefetch as well.

On ISAs that support offset addressing, it doesn't actually fetch the address of the array element each time, it just increments an offset. So it only fetches the base address of the array, once, and then uses a load instruction with the base and offset. Each time through the loop it increments the offset in a register. Some instruction sets even have auto-increment.

The best thing to do would be to write a sample program/function, and try it. Optimizations at this low a level require either a thorough knowledge of the CPu/system, or lots of trial and error.

Sign up to request clarification or add additional context in comments.

Comments

1

My humble recommendation: try and see. One loop solution saves arithmetic operations around increment and test of i. Two loops will probably profit of better cache optimization, especially if arrays are aligned to memory pages. In such case each access may cause a cache miss and cache reload. Personally, if the speed really matters I would prefer two loops with some unfolding.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.