Non-Dominated Sorting Genetic Algorithm II (NSGA II) is an evolutionary algorithm, which we use in multi-objective optimization scenarios. This article dwells on the nuts and bolts of the NSGA II algorithm while providing a brief lowdown of the context.

Before we begin, let’s try to understand what a genetic algorithm is briefly.

## Genetic Algorithm

Genetic Algorithm (GA) is an evolutionary algorithm that we use for optimization problems. This algorithm follows the natural process of evolution to produce a solution that is close to optimal. Since this algorithm is heuristic, this cannot find the exact global optima. Accordingly, this algorithm is most suitable for searching huge state spaces where traditional search algorithms cannot find the optimal solution within an acceptable amount of time.

Genetic Algorithm starts with a random population of individuals. Each individual is a chromosome that consists of a possible solution for a problem. For instance, let’s consider the popular knapsack problem where we have to find the right combination of items each with its weight and value so that the total weight doesn’t exceed a certain limit while we maximize the total value. When we use GA, every individual is a combination of items that is a probable solution to our problem.

A fitness function evaluates each individual and assigns a fitness value. For instance, a fitness function can compute the total value of the items in an individual while penalizing for exceeding the weight limit. Then, GA chooses the most fit individuals to produce offspring using various strategies such as tournament selection, roulette wheel selection, etc. Then two parent individuals exchange a part of them with one another in a process called crossover. This helps generate new individuals with new combinations. GA repeats this whole process again and again until it meets a terminating condition. This terminating condition can be a certain number of iterations or the average fitness of the population plateauing. Once done, GA chooses the most fit individual in the population as the optimal solution.

## Multi-objective Optimization

Now that we understand how GA works, let’s look at what multi-objective optimization is. As the name implies, this is an optimization problem that involves multiple objectives. In the knapsack problem we just discussed, our goal is to optimize the total value of items in the knapsack while making sure the total weight does not exceed the limit. Here, the total value is the objective while the total weight is a constraint. This is a single-objective optimization problem. Now, what if there is no weight limit but we want to minimize the total weight? This turns the total weight from a constraint to another objective. So, now we have two objectives: maximizing total value and minimizing total weight. This is what we call a multi-objective optimization problem.

We use the NSGA II algorithm to solve such multi-objective optimization problems. This algorithm is very similar to vanilla GA but it uses non-dominated sorting and crowding-distance sorting to select parent individuals and the individuals for the next iteration. To understand non-dominated sorting and crowding distance sorting, we first need to learn about the Pareto front.

## Pareto Front

Let’s go back to our multi-objective knapsack problem. Here, by adding more items, we can increase the total value. But this will also increase the total weight. Remember, our goal is to maximize the total value and minimize the total weight. Consequently, we can come up with a random population of individuals and plot their weight and value in a graph.

As we can see, with the increase in weight, the value also increases. Even though this is the general trend, we can also observe that some individuals have lesser value while being heavier. Accordingly, some individuals are better than others.

Let’s look at the red and green individuals in the graph.

The red individual has individuals that both weigh less and have more value than it. On the other hand, while there are individuals that weigh less or have more value than green, there is no individual that is both less heavy and more valuable than it. In other words, red has individuals that outperform it in both objectives, but green has none. We call individuals like green non-dominated. It is worth noting that, if an individual has an equal weight but more value or equal value but less weight, then we consider it to be dominating the other individual. The following graph shows all individuals that dominate the red individual circled in red.

Now, let’s try to find all the non-dominated individuals in the graph.

We call the curve consisting of all the non-dominated individuals the Pareto front or Pareto optimal. None of these individuals’ weight or value can be improved without compromising on the other.

## Non-dominated Sorting in NSGA II

Now that we understand the Pareto front, let’s look at non-dominated sorting. In non-dominated sorting, we produce several fronts sorted by their non-dominance. For instance, the green front is not dominated by any. So, this comes first on the list. Then, we can disregard this front, and try to find the next front not dominated by non-green individuals.

The yellow individuals here form the second front as only the greens dominate them. Now, let’s try to find all the possible fronts.

So, the order of fronts is green, yellow, teal, magenta, red, and blue. Now that we understand non-dominated sorting, let’s look at crowding-distance sorting.

## Crowding-distance Sorting in NSGA II

We use crowding-distance sorting to sort individuals within a front. To sort individuals by their crowding distance, first, we sort individuals in ascending order of their values along each axis/objective. Then, we assign the first and the last individuals a distance value of infinite. To calculate the distance of the individuals in between, we calculate the absolute normalized distance between the individual’s nearest neighbors on either side along both axes and sum them up.

So, if we consider the Pareto front, the crowding distance of i is the absolute normalized distance between (i+1) and (i-1) along both axes. Let’s assume that the distance along the x-axis is x and the y-axis is y.

x= \frac{{value((i+1), x) - value((i-1), x))}}{{value(max, x)) - value(min, x))}}
y= \frac{{value((i+1), y) - value((i-1), y))}}{{value(max, y)) - value(min, y))}}
distance = x+y

After finding the crowding distance, of each individual, we sort them in descending order. By assigning infinite to the first and the last individual in a front, we make sure these two values occupy the first two positions after sorting. By preferring individuals with higher distances, we make sure the individuals we choose are diverse.

## NSGA II

Now that we understand both non-dominated and crowding-distance sorting, let’s see how NSGA II uses both of them. First, during parent selection, NSGA II uses both of these sorting methods to rank individuals. The algorithm gives individuals having a higher(better) rank in the non-dominated list a higher ranking. If two individuals belong to the same front, then NSGA II gives precedence to the individual that ranks higher in the crowding-distance list. In other words, when it comes to parent selection, NSGA II first sorts the individuals according to the non-dominated rank and then according to the crowding-distance rank.

Once, NSGA II ranks individuals, it uses binary tournament selection to pick the parents and then carries out the usual procedures we see in GA such as crossover and mutation. Once it produces the offspring population, it uses elitism for replacement. The use of elitism ensures we don’t lose individuals belonging to the higher-ranking fronts. So, NSGA II combines both the parent and offspring population and ranks the individuals according to their non-dominated ranking. This means the list we end up with contains chunks of individuals belonging to fronts in the order of their rank. The algorithm continues to select all individuals belonging to the fronts in the order of their rank until it reaches the parent population size. F1 is the leading front followed by F2 and F3. F1 and F2 individuals ALL make it to the next generation. Since selecting all F3 individuals will mean that we will exceed the parent population, NSGA II uses crowding-distance sorting to allow only the required number of individuals from F3. Image courtesy: https://ieeexplore.ieee.org/document/996017

However, if selecting all individuals belonging to the subsequent front results in exceeding the size of the parent population, then NSGA II will use crowding-distance sorting to select just the right number of individuals belonging to that front.

So, that concludes our discussion on NSGA II, but wait, there is one more interesting thing. One of the significant improvements NSGA II made to the original NSGA algorithm was reducing the complexity of performing non-dominated sorting.

## Algorithm Complexity

To see if there is at least one individual that dominates a certain individual, we need to compare that individual’s value with that of every individual in the population for all objectives. This gives us a complexity of O(MN), where M is the number of objectives and N is the number of individuals. However, this is for one individual. If we do this for each individual, the complexity becomes O(MN2). But it doesn’t end there. This is to find the first non-dominated front. Once we find the first front, we need to disregard that and find the next one until every individual becomes a part of a front. In the worst-case scenario, where one individual belongs to one front, the complexity becomes O(MN3).

NSGA II reduces this complexity to O(MN2). Let’s see how. First, this algorithm stores the number of individuals that dominate a particular individual in a variable. Let’s call it ndominant. It also stores the individuals that this particular individual dominates in an array. Let’s call it Sdominate. As we have already seen, doing this for an individual entails a complexity of O(MN), and repeating this process for every individual will result in a complexity of O(MN2). But unlike in NSGA, we don’t need to repeat this for every front thanks to the variable and array. So, how do we identify different fronts and sort them?

Well, first NSGA II takes out all individuals whose ndominant value is 0. This means no individual dominates these individuals, so they form the first front. Then, this algorithm decrements by 1 the ndominant value of all individuals in the Sdominate array of each individual in the first front. Then, it repeats the whole process to find the next front. This process goes on until NSGA II identifies all the fronts. The complexity of this process is O(MN2) making NSGA II faster than NSGA.

## Summary

That’s it! We have learned how the NSGA II algorithm performs multi-objective optimization in this article. To summarize everything we have learned, NSGA II does non-dominated sorting and crowding-distance sorting to select parents and perform an elitist replacement. Non-dominated sorting ranks individuals in a population according to their dominance level while crowding-distance sorting ranks individuals within a front according to the distance between their two nearest neighbors. NSGA II reduces the complexity of the NSGA algorithm by obviating an additional iteration that it uses by persisting the number of individuals that dominate a given individual and the list of individuals that this individual dominates.