3

I have this array : var arr = new int[] { 1, 1, 0, -1, -1 }; where I need to count the number of positives, negatives and zeros numbers. I did it with foreach loop and with Linq and I tried to compare the performance between the two methods by using the Stopwatch here is my code:

        int pos = 0, neg = 0, zeros = 0;
        int p = 0, n = 0, z = 0;


        Stopwatch sw = new Stopwatch();

        sw.Start();

        pos = arr.Sum(e => e > 0 ? 1 : 0);
        neg = arr.Sum(e => e < 0 ? 1 : 0);
        zeros = arr.Sum(e => e == 0 ? 1 : 0);

        sw.Stop();


        Stopwatch sw2 = new Stopwatch();

        sw2.Start();

        foreach (var item in arr)
        {
            if (item > 0) p++;
            else if (item < 0) n++;
            else z++;
        }

        sw2.Stop();

        Console.WriteLine("Elapsed={0}", sw.Elapsed); //Elapsed=00:00:00.0008311
        Console.WriteLine("Elapsed2={0}", sw2.Elapsed); //Elapsed2=00:00:00.0000028

The results showed me that the foreach loop where a lot better (28ms) than the Linq method (8311ms), So my question is why there is all this difference in performance?

I even tried to make three foreach loop, one to count the negatives, one to count the positives and the third to count the zeros, but the performance was still good than the Linq method!

Thanks in advance for your help!

8
  • So, what is your question about? Commented Jan 31, 2020 at 10:17
  • I believe it is because your array is very small. Linq creates the enumerator internally, and it takes some time. But if you have array with thousands of items, I think, Linq could be faster. Commented Jan 31, 2020 at 10:17
  • @PavelAnikhouski why there is all this difference in performance? Commented Jan 31, 2020 at 10:18
  • 1
    You're creating 3 sequences with your linq methodS, vs 1 foreach loop Commented Jan 31, 2020 at 10:18
  • 1
    Also you can change it a little bit: pos = arr.Count(e => e > 0); neg = arr.Count(e => e < 0); zeros = arr.Count(e => e == 0); Commented Jan 31, 2020 at 10:21

2 Answers 2

3

Let's carry out horse races in a bit different manner:

private static string UnderTest(int size) {
  int pos = 0, neg = 0, zeros = 0;
  int p = 0, n = 0, z = 0;

  Random random = new Random(0);
  int[] arr = Enumerable
    .Range(0, size)
    .Select(x => random.Next(-1, 2))
    .ToArray();
  GC.Collect(GC.MaxGeneration);

  Stopwatch sw = new Stopwatch();
  sw.Start();

  // Three Linq loops (expected to be 3 three times slower)
  pos = arr.Sum(e => e > 0 ? 1 : 0);    
  neg = arr.Sum(e => e < 0 ? 1 : 0);
  zeros = arr.Sum(e => e == 0 ? 1 : 0);

  sw.Stop();

  Stopwatch sw2 = new Stopwatch();
  sw2.Start();

  // Just 1 loop 
  foreach (var item in arr) {
    if (item > 0) p++;
    else if (item < 0) n++;
    else z++;
  }

  sw2.Stop();

  return $"{sw.Elapsed} vs. {sw2.Elapsed} ratio: {((double)sw.Elapsed.Ticks) / sw2.Elapsed.Ticks:f3}";
} 

Races for different array sizes:

   int[] loops = new int[] {
    1000,
    10_000,
    100_000,
    1_000_000,
    10_000_000,
    100_000_000,
    1000,         // <- 1000 again
  };

  string report = string.Join(Environment.NewLine, loops
    .Select(loop => $"loops: {loop,10} : {UnderTest(loop)}"));

  Console.Write(report);

Outcome:

loops:       1000 : 00:00:00.0006471 vs. 00:00:00.0000114 ratio: 56.763 // <- Warming up
loops:      10000 : 00:00:00.0003195 vs. 00:00:00.0001074 ratio: 2.975
loops:     100000 : 00:00:00.0037131 vs. 00:00:00.0010910 ratio: 3.403
loops:    1000000 : 00:00:00.0351574 vs. 00:00:00.0118858 ratio: 2.958
loops:   10000000 : 00:00:00.3729617 vs. 00:00:00.1198276 ratio: 3.112
loops:  100000000 : 00:00:03.7002508 vs. 00:00:01.1808595 ratio: 3.134
loops:       1000 : 00:00:00.0000322 vs. 00:00:00.0000099 ratio: 3.253 // <- Expected

What's going on: we have 3 Linq based loops (3 calls for Sum), so Linq 3 times slower ( no surprise). However we have a huge outlier at the very first run when system loads assembly, compiles IL code etc. So you have a so called warming up effect.

Sign up to request clarification or add additional context in comments.

4 Comments

Very nice answer. However, would you please explain why you are collecting garbage after creating the array loops ?
@Cid: Thank you! A better place for GC.Collect(GC.MaxGeneration); is just before races (within UnderTest): we don't want GC occasionally starts and disturbe the measurements
This makes sense, I'm not used to manually call GC, so collecting directly in the code would prevent the garbages to stack (heap, eh) up the street and the collector being called automatically ?
@Cid: We try to avoid unwanted processes (esp. garbage collecting which stops the world) to interfere. If we collect all the garbage (e.g., the array with millions of items) we can hope, that the following run will not be interrupted (say, in the middle of foreach).
1

Foreach is a fancy representation for a simple for, iterating over the array once and counting all 3 values at once.

The sum creates a temporary variable(let us call it sum), and iterates over the array, calling the function you provided as argument, returning it's value and adding it to our temporary variable. This means, you call n functions, where n is the length of the array.

As you can probably guess from this, calling a function is way more expensive( in terms of clock cycles) than simply adding them up. You also do this 3 times, one for each arr.sum.

On top of all this, there's some overhead on(well any) functions, library or not, and since the array size isn't very big, that overhead WILL matter

1 Comment

There could also be some minor performance gains to be had regarding cache in the foreach, but they most likely are too small to matter in this example

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.