# Introduction to algorithm analysis

## Introduction to algorithm analysis

When trying to solve a problem, the question often arises of the choice of an algorithm. What criteria can guide this choice? Two contradictory needs are frequently encountered. The algorithm must:

•   Be simple to understand, to implement, to develop.
•   Use computer resources intelligently and, more precisely, it must run as quickly as possible.

If an algorithm should only be used a small number of times, the first criterion is the most important. On the other hand, if it must be used often, the execution time may be a more determining factor than the time spent writing.

The objective of studying the complexity of algorithms is to estimate the cost of an algorithm. This measurement therefore allows the comparison of two algorithms without having to program them.

If we take into account for the estimation of the complexity the resources of the machine such as the clock frequency, the number of processors, the disk access time etc., we immediately realize the complication or even the impossibility of such a task. For this, we often just estimate the relationship between data size and execution time, regardless of the architecture used.

It is a simplified model that takes into account technological resources and their associated costs. We will take as a reference a model of random access machine with single processor where the operations are executed one after the other without simultaneous operations.

We thus define the time complexity of an algorithm as being the measure of the number of elementary operations $$T(n)$$ that it performs on a data set n. Complexity is expressed as a function of the size of the dataset.

To compare solutions with each other, two methods can be used:

•   Empirical method
•   Mathematical method

#### Empirical method

It involves coding and executing two (or more) algorithms on a randomly generated data bank. At each execution, the execution time of each of the algorithms is measured.

Then, a statistical study is undertaken to choose the best among them in the light of the results obtained.

•   The machine used;
•   Set of instructions used
•   The skill of the programmer
•   Generated data set
•   Compiler chosen
•   The environment in which the two algorithms are executed (shared or not)

#### Mathematical method

To overcome these problems, a simpler but effective notion of complexity has been proposed by computer scientists.
Thus, to measure this complexity, the mathematical method does not consist in measuring it in seconds, but in counting the basic instructions executed by these two algorithms. This way of proceeding is justified by the fact that the complexity of an algorithm is largely induced by the execution of the instructions that compose it.
However, to get a more precise idea of the performance of an algorithm, it should be noted that the experimental and mathematical method are in fact complementary.

#### Asymptotic analysis

The asymptotic complexity of an algorithm describes the behavior of the algorithm when the size n of the data of the treated problem becomes larger, rather than an exact measure of the execution time.

Thus, if $$T(N) = 3 + 2N + 1$$, then we will say that the complexity of this algorithm is quite simply in N. We have eliminated all constant, and we have also assumed that the operations of assignment, test and addition have constant times.

Formally, the complexity of an algorithm ALGO is called any order of magnitude of the number of elementary operations performed during the execution of ALGO.

It is obvious from the previous remark that the complexity of an algorithm may not be the same for two different datasets. This can have consequences on the choice of the best algorithm for a given problem. In practice, we are interested in the following forms of complexity:

Generally, we perform the following types of analysis

•   Worst case scenario - This is the greatest number of operations that the algorithm will have to execute on a set of data of fixed size n.
•   Best case - This is the smallest number of operations that the algorithm will have to execute on a set of data of fixed size, here at n. It is a lower bound of the complexity of the algorithm on a dataset of size n.
•   Average case - It is a question of calculating the average (mathematical expectation) of the numbers of elementary operations carried out on the totality of the instances of size n.

Consider the following implementation of linear research.

                                                #include <stdio.h>

int search(int tab[], int n, int x)
{
int i;
for (i = 0; i < n; i++)
{
if (tab[i] == x)
return i;
}
return -1;
}

int main()
{
int T[] = {1, 3, 17, 80, 25, 10, 30, 15};
int x = 25, n, pos;
n = sizeof(T) / sizeof(T[0]); // size of T

pos = search(T, n, x);
if (pos < 0)
printf("The element %d does not belong to the table", x);
else
printf("the element %d is found in the array at position %d", x, pos);

return 0;
}

                                            def search(tab, n, x):
i = 0
for i in range(i, n):
if (tab[i] == x):
return i
return -1

T = [1, 3, 17, 80, 25, 10, 30, 15]
x = 25
n = len(T)
pos = search(T, n, x)
if (pos < 0):
print("The element {} does not belong to the table".format(x))
else:
print("the element {} is found in the array at position {}".format(x, pos))



When the above code is compiled and executed, it produces the following result

The element 25 is found in the array at position 4
##### The worst case

For linear search, the worst case occurs when the element to search for (x in the code above) is not present in the table. When x is not present, the search () function compares it to all the elements of tab[] one by one. Therefore, the time complexity of the worst-case linear search would be $$\Theta(n)$$.

##### Average case

In the average case analysis, we take all the possible inputs and calculate the calculation time for all the inputs. Add up all the calculated values and divide the sum by the total number of entries. We need to know (or predict) the distribution of cases. For the linear search problem, suppose that all the cases are uniformly distributed (including the case where x is not present in the table). So we add up all the cases and divide the sum by (n + 1).

$$$$\label{eq1} \begin{split} Average \: time & = \frac{\sum_{i=1}^{n+1} \theta(i)}{n+1} \\ & = \frac{\theta((n+1)*(n+2)/2)}{n+1} \\ & = \theta(n) \end{split}$$$$

##### Best case

In the best case analysis, we calculate the lower limit of the execution time of an algorithm. We need to know the case that results in the execution of a minimum number of operations. In the linear search problem, the best case occurs when x is present at the first location. The number of operations in the best of cases is constant (does not depend on n). Therefore, the best time complexity would be $$\Theta(1)$$.

In general, we consider that an algorithm is more efficient than another if its complexity in the worst case has an order of magnitude lower.