Skip to main content
replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link

First, a simple error. For benchmarks, you always want to turn all compiler optimisations on, otherwise you're testing something different from the real thing. However, the problem is when you isolate an algorithm in a simple looping test like this one. If your algorithm does something really simple and with few side effects, the compiler could simplify parts of your algorithm awaysimplify parts of your algorithm away!

Some more relevant information: http://stackoverflow.com/q/8547778/2038264https://stackoverflow.com/q/8547778/2038264

First, a simple error. For benchmarks, you always want to turn all compiler optimisations on, otherwise you're testing something different from the real thing. However, the problem is when you isolate an algorithm in a simple looping test like this one. If your algorithm does something really simple and with few side effects, the compiler could simplify parts of your algorithm away!

Some more relevant information: http://stackoverflow.com/q/8547778/2038264

First, a simple error. For benchmarks, you always want to turn all compiler optimisations on, otherwise you're testing something different from the real thing. However, the problem is when you isolate an algorithm in a simple looping test like this one. If your algorithm does something really simple and with few side effects, the compiler could simplify parts of your algorithm away!

Some more relevant information: https://stackoverflow.com/q/8547778/2038264

replaced http://programmers.stackexchange.com/ with https://softwareengineering.stackexchange.com/
Source Link
oops
Source Link
congusbongus
  • 14.9k
  • 59
  • 91

Furthermore, performance is sensitive to the size of the dataset - if it entirely fits into L1 cache, it'll be super fast; if it fits into L2 it'll be slower, L3 then a bit slower still, and so on. Take a sorting algorithm for example; if 100 elements fit in L1 but 200 can only fit in L2, then in the first case the CPU only ever needs to use L1, whereas in the second case it'll be also using L2 occasionally. So sorting two 100-element collections will be slowerfaster than sorting a single 200-element list. If you plot it out you'll get a step-like function:

Furthermore, performance is sensitive to the size of the dataset - if it entirely fits into L1 cache, it'll be super fast; if it fits into L2 it'll be slower, L3 then a bit slower still, and so on. Take a sorting algorithm for example; if 100 elements fit in L1 but 200 can only fit in L2, then in the first case the CPU only ever needs to use L1, whereas in the second case it'll be also using L2 occasionally. So sorting two 100-element collections will be slower than sorting a single 200-element list. If you plot it out you'll get a step-like function:

Furthermore, performance is sensitive to the size of the dataset - if it entirely fits into L1 cache, it'll be super fast; if it fits into L2 it'll be slower, L3 then a bit slower still, and so on. Take a sorting algorithm for example; if 100 elements fit in L1 but 200 can only fit in L2, then in the first case the CPU only ever needs to use L1, whereas in the second case it'll be also using L2 occasionally. So sorting two 100-element collections will be faster than sorting a single 200-element list. If you plot it out you'll get a step-like function:

Source Link
congusbongus
  • 14.9k
  • 59
  • 91
Loading