1

I'm testing sorting algorithm and tried it with different amount of data. 100 thousand elements 1 million elements up to 10 million of elements.

I need to calculate the complexity of this algorithm by having outputs for how long every sorting took.

How can I do that?

8
  • You can't really, since the time may also be dependent on the ordering of the input data. Commented Apr 6, 2016 at 19:57
  • What if I would have steps the program took to complete sorting each time? I mean 'primitive' steps. Commented Apr 6, 2016 at 20:04
  • It's still data-dependent, e.g. bubble sort ranges from O(n) if the data is sorted to O(n^2) if the data is reverse sorted. You would need to define the nature of the input data for all your test cases, and even then this would only tell you the complexity for this specific case. Commented Apr 6, 2016 at 20:35
  • If you're concerned about worst-case complexity, you can't do it that way. For example, if you're testing some semi-naive version of quicksort, you can't expect to stumble on the O(n^2) worst case by accident. Commented Apr 6, 2016 at 21:51
  • 1
    Possible duplicate of Empirically estimating big-oh time efficiency Commented Apr 7, 2016 at 6:12

1 Answer 1

1

While you can't find the running time of an algorithm without doing mathematical analysis, the empirical measurements can give you a reasonable idea of how the running time of the algorithm---or rather the program---behaves.

For example, if you have n measurements (x1, y1), (x2, y2), ..., (xn, yn), where xi is the size of the input and yi is the time of the program on the input of that size, then you can plot the function to see whether it's a polynomial. In practice it often is. However, it's hard to see what the exponent should be from the plot.

To find the exponent you could find the slope of the line that best fits the points (log xi, log yi). This is because if y=C*x^k+lower order terms, then since the term C*x^k dominates we expect log y =~ k*log x + log C, i.e., the log-log equation is a linear one whenever the "original" equation is a polynomial one. (Whenever you see a linear function in the log-log plot, your running time is polynomial; the slope of the line tells you the degree of the polynomial.)

Here's a plot of the quadratic function y(x)=x^2: Quadratic function

And here's the corresponding log-log plot: The corresponding log-log plot

We can see that it's a line with slope 2 (in practice you would compute this using, for example, linear least squares). This is expected because log y(x) = 2 * log(x).

The code I used:

x = 1:1:100;
y = x.^2;
plot(x, y);
plot(log(x), log(y));

In practice the function looks messier and the slope can (or should) only be used as a rule of thumb when nothing else is available.

I imagine there are many other tricks to learn about program behavior from running time measurements. I'll give others a chance to share their experience.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.