Empirical way of calculating running time of Algorithms

Introduction

In the previous post, we learned the theoretical (or mathematical) approach for computing the running time of an algorithm. In this post, we will learn more practical approach for computing the running time. Please note that the empirical method is very limited and does not work for all kinds of algorithms. It best works for those algorithms whose running time is in power of $n$.

In the theoretical approach, we use various theoretical and mathematical knowledge to find the running time. We do not need a computer for this. On the other hand, an empirical approach, we run the program on the actual computer. We run the program on various data sets, the relative performance gives the approximate running time of the program. Let me say it again that the theoretical approach is much superior and I strongly recommend to do it wherever applicable.

The empirical approach can be used when you do not have enough theoretical background on algorithm analysis. It can also be useful when you do not have access to the source code, or your source code is minified. In this case, we can run the algorithm with varying input size. We record the time taken by the program to run on each input. We can use a mathematical tool like regression to obtain the mathematical model that can be used to predict the running time of unknown input size.

Steps for the empirical approach

First, I will explain all the steps needed to calculate the running time using an empirical approach and then I will calculate the running time of a selection sort using this approach.

Experiment with the program

The first step is to run the program and see the result on different input size. We start with small input and gradually increase the input size and record the corresponding time taken by the program to run. We usually double the size of the input each time and run the program. Do this experiment up to 10 times.

Tabulate the data

Now tabulate the input size and the corresponding time. A sample table is given below.

Input Size (N)	Time Taken T(N)
200	0.001 sec
400	0.005 sec
800	0.012 sec
…	…
16000	5 sec

Most of the time, 10 rows are sufficient if you run the program by doubling the input size each time. Our goal is to find a mathematical function that can best fit this result.

Normalize the data

We transform both input size and time taken into a logarithmic scale. Since we double and run the experiment each time, when we normalize the data into $\log_2$ scale and plot the result (input size in the x-axis and runtime in y-axis), we get a straight line. The slope of the line gives the polynomial degree of the running time. That means, if the slope of the graph is 3 then the running time of the program is $\Theta(n^3)$.

Approximate the function

The equation of the straight line we get after we normalize the data in logarithmic scale looks like
$$\begin{align} \log_2(T(N)) = a * \log_2(N) + b\end{align}$$
To get the required model, we need to find the value of $a$ and $b$. We can find the value of $a$ and $b$ by feeding the transformed data into equation (1). After we have the model, we can simply input the input size and it gives the time required to run the program. If we raise the left and right-hand side of the equation (1), we get the following
$$T(N) = 2^bN^a$$
We already know $a$ and $b$. The only independent variable is $N$ which is the size of the input.

Example

We will go through the steps discussed above and calculate the running time of Selection sort. A Python implementation of Selection Sort is given below.

def selectionSort(alist):
   for fillslot in range(len(alist)-1,0,-1):
       positionOfMax=0
       for location in range(1,fillslot+1):
           if alist[location]>alist[positionOfMax]:
               positionOfMax = location

       temp = alist[fillslot]
       alist[fillslot] = alist[positionOfMax]
       alist[positionOfMax] = temp

The result of running above selection sort algorithm on my computer is tabulated below.

Input Size (N)	Running Time T(N)
250	0.001868 sec
500	0.009274 sec
1000	0.028015 sec
2000	0.1062 sec
4000	0.46821 sec
8000	1.8646 sec
16000	7.6841 sec
32000	31.631799 sec

Next, we transform these values into a logarithmic scale

Input Size (N)	Running Time T(N)
7.96	-9.064
8.96	-6.75
9.96	-5.15
10.96	-3.23
11.96	-1.09
12.96	0.89
13.96	2.94
14.96	4.98

If we do the regression fit of these transformed values, we get $a = 2.011, b = -25.16$. So our model becomes,
$$T(N) = 2^{-25.16}N^{2.011}$$
The running time shows that the program has complexity $\Theta(n^{2.011})$, which is pretty close to the complexity of Selection Sort i.e. $\Theta(n^{2})$.
Now using this model we can predict the running time for $N = 64000$.
$$T(64000) = 2^{-25.16}64000^{2.011} = 123.39$$
I ran the program with 64000 input size and the program gave the result in 126.668 sec. This shows our model is accurate.