Genetic Algorithms – Advanced Topics In this section, we introduce some advanced topics in Genetic Algorithms. A reader looking for just an introduction to GAs may choose to skip this section. Constrained Optimization Problems Constrained Optimization Problems are those optimization problems in which we have to maximize or minimize a given objective function value that is subject to certain constraints. Therefore, not all results in the solution space are feasible, and the solution space contains feasible regions as shown in the following image. In such a scenario, crossover and mutation operators might give us solutions which are infeasible. Therefore, additional mechanisms have to be employed in the GA when dealing with constrained Optimization Problems. Some of the most common methods are − Using penalty functions which reduces the fitness of infeasible solutions, preferably so that the fitness is reduced in proportion with the number of constraints violated or the distance from the feasible region. Using repair functions which take an infeasible solution and modify it so that the violated constraints get satisfied. Not allowing infeasible solutions to enter into the population at all. Use a special representation or decoder functions that ensures feasibility of the solutions. Basic Theoretical Background In this section, we will discuss about the Schema and NFL theorem along with the building block hypothesis. Schema Theorem Researchers have been trying to figure out the mathematics behind the working of genetic algorithms, and Holland’s Schema Theorem is a step in that direction. Over the year’s various improvements and suggestions have been done to the Schema Theorem to make it more general. In this section, we don’t delve into the mathematics of the Schema Theorem, rather we try to develop a basic understanding of what the Schema Theorem is. The basic terminology to know are as follows − A Schema is a “template”. Formally, it is a string over the alphabet = {0,1,*}, where * is don’t care and can take any value. Therefore, *10*1 could mean 01001, 01011, 11001, or 11011 Geometrically, a schema is a hyper-plane in the solution search space. Order of a schema is the number of specified fixed positions in a gene. Defining length is the distance between the two furthest fixed symbols in the gene. The schema theorem states that this schema with above average fitness, short defining length and lower order is more likely to survive crossover and mutation. Building Block Hypothesis Building Blocks are low order, low defining length schemata with the above given average fitness. The building block hypothesis says that such building blocks serve as a foundation for the GAs success and adaptation in GAs as it progresses by successively identifying and recombining such “building blocks”. No Free Lunch (NFL) Theorem Wolpert and Macready in 1997 published a paper titled “No Free Lunch Theorems for Optimization.” It essentially states that if we average over the space of all possible problems, then all non-revisiting black box algorithms will exhibit the same performance. It means that the more we understand a problem, our GA becomes more problem specific and gives better performance, but it makes up for that by performing poorly for other problems. GA Based Machine Learning Genetic Algorithms also find application in Machine Learning. Classifier systems are a form of genetics-based machine learning (GBML) system that are frequently used in the field of machine learning. GBML methods are a niche approach to machine learning. There are two categories of GBML systems − The Pittsburg Approach − In this approach, one chromosome encoded one solution, and so fitness is assigned to solutions. The Michigan Approach − one solution is typically represented by many chromosomes and so fitness is assigned to partial solutions. It should be kept in mind that the standard issue like crossover, mutation, Lamarckian or Darwinian, etc. are also present in the GBML systems. Learning working make money
Category: genetic Algorithms
Genetic Algorithms – Fundamentals This section introduces the basic terminology required to understand GAs. Also, a generic structure of GAs is presented in both pseudo-code and graphical forms. The reader is advised to properly understand all the concepts introduced in this section and keep them in mind when reading other sections of this tutorial as well. Basic Terminology Before beginning a discussion on Genetic Algorithms, it is essential to be familiar with some basic terminology which will be used throughout this tutorial. Population − It is a subset of all the possible (encoded) solutions to the given problem. The population for a GA is analogous to the population for human beings except that instead of human beings, we have Candidate Solutions representing human beings. Chromosomes − A chromosome is one such solution to the given problem. Gene − A gene is one element position of a chromosome. Allele − It is the value a gene takes for a particular chromosome. Genotype − Genotype is the population in the computation space. In the computation space, the solutions are represented in a way which can be easily understood and manipulated using a computing system. Phenotype − Phenotype is the population in the actual real world solution space in which solutions are represented in a way they are represented in real world situations. Decoding and Encoding − For simple problems, the phenotype and genotype spaces are the same. However, in most of the cases, the phenotype and genotype spaces are different. Decoding is a process of transforming a solution from the genotype to the phenotype space, while encoding is a process of transforming from the phenotype to genotype space. Decoding should be fast as it is carried out repeatedly in a GA during the fitness value calculation. For example, consider the 0/1 Knapsack Problem. The Phenotype space consists of solutions which just contain the item numbers of the items to be picked. However, in the genotype space it can be represented as a binary string of length n (where n is the number of items). A 0 at position x represents that xth item is picked while a 1 represents the reverse. This is a case where genotype and phenotype spaces are different. Fitness Function − A fitness function simply defined is a function which takes the solution as input and produces the suitability of the solution as the output. In some cases, the fitness function and the objective function may be the same, while in others it might be different based on the problem. Genetic Operators − These alter the genetic composition of the offspring. These include crossover, mutation, selection, etc. Basic Structure The basic structure of a GA is as follows − We start with an initial population (which may be generated at random or seeded by other heuristics), select parents from this population for mating. Apply crossover and mutation operators on the parents to generate new off-springs. And finally these off-springs replace the existing individuals in the population and the process repeats. In this way genetic algorithms actually try to mimic the human evolution to some extent. Each of the following steps are covered as a separate chapter later in this tutorial. A generalized pseudo-code for a GA is explained in the following program − GA() initialize population find fitness of population while (termination criteria is reached) do parent selection crossover with probability pc mutation with probability pm decode and fitness calculation survivor selection find best return best Learning working make money
Genotype Representation One of the most important decisions to make while implementing a genetic algorithm is deciding the representation that we will use to represent our solutions. It has been observed that improper representation can lead to poor performance of the GA. Therefore, choosing a proper representation, having a proper definition of the mappings between the phenotype and genotype spaces is essential for the success of a GA. In this section, we present some of the most commonly used representations for genetic algorithms. However, representation is highly problem specific and the reader might find that another representation or a mix of the representations mentioned here might suit his/her problem better. Binary Representation This is one of the simplest and most widely used representation in GAs. In this type of representation the genotype consists of bit strings. For some problems when the solution space consists of Boolean decision variables – yes or no, the binary representation is natural. Take for example the 0/1 Knapsack Problem. If there are n items, we can represent a solution by a binary string of n elements, where the xth element tells whether the item x is picked (1) or not (0). For other problems, specifically those dealing with numbers, we can represent the numbers with their binary representation. The problem with this kind of encoding is that different bits have different significance and therefore mutation and crossover operators can have undesired consequences. This can be resolved to some extent by using Gray Coding, as a change in one bit does not have a massive effect on the solution. Real Valued Representation For problems where we want to define the genes using continuous rather than discrete variables, the real valued representation is the most natural. The precision of these real valued or floating point numbers is however limited to the computer. Integer Representation For discrete valued genes, we cannot always limit the solution space to binary ‘yes’ or ‘no’. For example, if we want to encode the four distances – North, South, East and West, we can encode them as {0,1,2,3}. In such cases, integer representation is desirable. Permutation Representation In many problems, the solution is represented by an order of elements. In such cases permutation representation is the most suited. A classic example of this representation is the travelling salesman problem (TSP). In this the salesman has to take a tour of all the cities, visiting each city exactly once and come back to the starting city. The total distance of the tour has to be minimized. The solution to this TSP is naturally an ordering or permutation of all the cities and therefore using a permutation representation makes sense for this problem. Learning working make money
Models Of Lifetime Adaptation Till now in this tutorial, whatever we have discussed corresponds to the Darwinian model of evolution – natural selection and genetic variation through recombination and mutation. In nature, only the information contained in the individual’s genotype can be transmitted to the next generation. This is the approach which we have been following in the tutorial so far. However, other models of lifetime adaptation – Lamarckian Model and Baldwinian Model also do exist. It is to be noted that whichever model is the best, is open for debate and the results obtained by researchers show that the choice of lifetime adaptation is highly problem specific. Often, we hybridize a GA with local search – like in Memetic Algorithms. In such cases, one might choose do go with either Lamarckian or Baldwinian Model to decide what to do with individuals generated after the local search. Lamarckian Model The Lamarckian Model essentially says that the traits which an individual acquires in his/her lifetime can be passed on to its offspring. It is named after French biologist Jean-Baptiste Lamarck. Even though, natural biology has completely disregarded Lamarckism as we all know that only the information in the genotype can be transmitted. However, from a computation view point, it has been shown that adopting the Lamarckian model gives good results for some of the problems. In the Lamarckian model, a local search operator examines the neighborhood (acquiring new traits), and if a better chromosome is found, it becomes the offspring. Baldwinian Model The Baldwinian model is an intermediate idea named after James Mark Baldwin (1896). In the Baldwin model, the chromosomes can encode a tendency of learning beneficial behaviors. This means, that unlike the Lamarckian model, we don’t transmit the acquired traits to the next generation, and neither do we completely ignore the acquired traits like in the Darwinian Model. The Baldwin Model is in the middle of these two extremes, wherein the tendency of an individual to acquire certain traits is encoded rather than the traits themselves. In this Baldwinian Model, a local search operator examines the neighborhood (acquiring new traits), and if a better chromosome is found, it only assigns the improved fitness to the chromosome and does not modify the chromosome itself. The change in fitness signifies the chromosomes capability to “acquire the trait”, even though it is not passed directly to the future generations. Learning working make money
Discuss Genetic Algorithms This tutorial covers the topic of Genetic Algorithms. From this tutorial, you will be able to understand the basic concepts and terminology involved in Genetic Algorithms. We will also discuss the various crossover and mutation operators, survivor selection, and other components as well. Also, there will be other advanced topics that deal with topics like Schema Theorem, GAs in Machine Learning, etc. which are also covered in this tutorial. After going through this tutorial, the reader is expected to gain sufficient knowledge to come up with his/her own genetic algorithms for a given problem. Learning working make money
Genetic Algorithms – Survivor Selection The Survivor Selection Policy determines which individuals are to be kicked out and which are to be kept in the next generation. It is crucial as it should ensure that the fitter individuals are not kicked out of the population, while at the same time diversity should be maintained in the population. Some GAs employ Elitism. In simple terms, it means the current fittest member of the population is always propagated to the next generation. Therefore, under no circumstance can the fittest member of the current population be replaced. The easiest policy is to kick random members out of the population, but such an approach frequently has convergence issues, therefore the following strategies are widely used. Age Based Selection In Age-Based Selection, we don’t have a notion of a fitness. It is based on the premise that each individual is allowed in the population for a finite generation where it is allowed to reproduce, after that, it is kicked out of the population no matter how good its fitness is. For instance, in the following example, the age is the number of generations for which the individual has been in the population. The oldest members of the population i.e. P4 and P7 are kicked out of the population and the ages of the rest of the members are incremented by one. Fitness Based Selection In this fitness based selection, the children tend to replace the least fit individuals in the population. The selection of the least fit individuals may be done using a variation of any of the selection policies described before – tournament selection, fitness proportionate selection, etc. For example, in the following image, the children replace the least fit individuals P1 and P10 of the population. It is to be noted that since P1 and P9 have the same fitness value, the decision to remove which individual from the population is arbitrary. Learning working make money
Genetic Algorithms – Useful Resources The following resources contain additional information on Genetic Algorithms. Please use them to get more in-depth knowledge on this. Useful Video Courses 68 Lectures 3.5 hours 19 Lectures 1.5 hours 100 Lectures 22 hours 6 Lectures 1 hours 13 Lectures 43 mins 17 Lectures 1.5 hours Learning working make money
Genetic Algorithms – Quick Guide Genetic Algorithms – Introduction Genetic Algorithm (GA) is a search-based optimization technique based on the principles of Genetics and Natural Selection. It is frequently used to find optimal or near-optimal solutions to difficult problems which otherwise would take a lifetime to solve. It is frequently used to solve optimization problems, in research, and in machine learning. Introduction to Optimization Optimization is the process of making something better. In any process, we have a set of inputs and a set of outputs as shown in the following figure. Optimization refers to finding the values of inputs in such a way that we get the “best” output values. The definition of “best” varies from problem to problem, but in mathematical terms, it refers to maximizing or minimizing one or more objective functions, by varying the input parameters. The set of all possible solutions or values which the inputs can take make up the search space. In this search space, lies a point or a set of points which gives the optimal solution. The aim of optimization is to find that point or set of points in the search space. What are Genetic Algorithms? Nature has always been a great source of inspiration to all mankind. Genetic Algorithms (GAs) are search based algorithms based on the concepts of natural selection and genetics. GAs are a subset of a much larger branch of computation known as Evolutionary Computation. GAs were developed by John Holland and his students and colleagues at the University of Michigan, most notably David E. Goldberg and has since been tried on various optimization problems with a high degree of success. In GAs, we have a pool or a population of possible solutions to the given problem. These solutions then undergo recombination and mutation (like in natural genetics), producing new children, and the process is repeated over various generations. Each individual (or candidate solution) is assigned a fitness value (based on its objective function value) and the fitter individuals are given a higher chance to mate and yield more “fitter” individuals. This is in line with the Darwinian Theory of “Survival of the Fittest”. In this way we keep “evolving” better individuals or solutions over generations, till we reach a stopping criterion. Genetic Algorithms are sufficiently randomized in nature, but they perform much better than random local search (in which we just try various random solutions, keeping track of the best so far), as they exploit historical information as well. Advantages of GAs GAs have various advantages which have made them immensely popular. These include − Does not require any derivative information (which may not be available for many real-world problems). Is faster and more efficient as compared to the traditional methods. Has very good parallel capabilities. Optimizes both continuous and discrete functions and also multi-objective problems. Provides a list of “good” solutions and not just a single solution. Always gets an answer to the problem, which gets better over the time. Useful when the search space is very large and there are a large number of parameters involved. Limitations of GAs Like any technique, GAs also suffer from a few limitations. These include − GAs are not suited for all problems, especially problems which are simple and for which derivative information is available. Fitness value is calculated repeatedly which might be computationally expensive for some problems. Being stochastic, there are no guarantees on the optimality or the quality of the solution. If not implemented properly, the GA may not converge to the optimal solution. GA – Motivation Genetic Algorithms have the ability to deliver a “good-enough” solution “fast-enough”. This makes genetic algorithms attractive for use in solving optimization problems. The reasons why GAs are needed are as follows − Solving Difficult Problems In computer science, there is a large set of problems, which are NP-Hard. What this essentially means is that, even the most powerful computing systems take a very long time (even years!) to solve that problem. In such a scenario, GAs prove to be an efficient tool to provide usable near-optimal solutions in a short amount of time. Failure of Gradient Based Methods Traditional calculus based methods work by starting at a random point and by moving in the direction of the gradient, till we reach the top of the hill. This technique is efficient and works very well for single-peaked objective functions like the cost function in linear regression. But, in most real-world situations, we have a very complex problem called as landscapes, which are made of many peaks and many valleys, which causes such methods to fail, as they suffer from an inherent tendency of getting stuck at the local optima as shown in the following figure. Getting a Good Solution Fast Some difficult problems like the Travelling Salesperson Problem (TSP), have real-world applications like path finding and VLSI Design. Now imagine that you are using your GPS Navigation system, and it takes a few minutes (or even a few hours) to compute the “optimal” path from the source to destination. Delay in such real world applications is not acceptable and therefore a “good-enough” solution, which is delivered “fast” is what is required. Genetic Algorithms – Fundamentals This section introduces the basic terminology required to understand GAs. Also, a generic structure of GAs is presented in both pseudo-code and graphical forms. The reader is advised to properly understand all the concepts introduced in this section and keep them in mind when reading other sections of this tutorial as well. Basic Terminology Before beginning a discussion on Genetic Algorithms, it is essential to be familiar with some basic terminology which will be used throughout this tutorial. Population − It is a subset of all the possible (encoded) solutions to the given problem. The population for a GA is analogous to the population for human beings except that instead of human beings, we have Candidate Solutions representing human beings. Chromosomes − A chromosome is one such solution to the given problem. Gene
Genetic Algorithms – Mutation Introduction to Mutation In simple terms, mutation may be defined as a small random tweak in the chromosome, to get a new solution. It is used to maintain and introduce diversity in the genetic population and is usually applied with a low probability – pm. If the probability is very high, the GA gets reduced to a random search. Mutation is the part of the GA which is related to the “exploration” of the search space. It has been observed that mutation is essential to the convergence of the GA while crossover is not. Mutation Operators In this section, we describe some of the most commonly used mutation operators. Like the crossover operators, this is not an exhaustive list and the GA designer might find a combination of these approaches or a problem-specific mutation operator more useful. Bit Flip Mutation In this bit flip mutation, we select one or more random bits and flip them. This is used for binary encoded GAs. Random Resetting Random Resetting is an extension of the bit flip for the integer representation. In this, a random value from the set of permissible values is assigned to a randomly chosen gene. Swap Mutation In swap mutation, we select two positions on the chromosome at random, and interchange the values. This is common in permutation based encodings. Scramble Mutation Scramble mutation is also popular with permutation representations. In this, from the entire chromosome, a subset of genes is chosen and their values are scrambled or shuffled randomly. Inversion Mutation In inversion mutation, we select a subset of genes like in scramble mutation, but instead of shuffling the subset, we merely invert the entire string in the subset. Learning working make money
Genetic Algorithms – Application Areas Genetic Algorithms are primarily used in optimization problems of various kinds, but they are frequently used in other application areas as well. In this section, we list some of the areas in which Genetic Algorithms are frequently used. These are − Optimization − Genetic Algorithms are most commonly used in optimization problems wherein we have to maximize or minimize a given objective function value under a given set of constraints. The approach to solve Optimization problems has been highlighted throughout the tutorial. Economics − GAs are also used to characterize various economic models like the cobweb model, game theory equilibrium resolution, asset pricing, etc. Neural Networks − GAs are also used to train neural networks, particularly recurrent neural networks. Parallelization − GAs also have very good parallel capabilities, and prove to be very effective means in solving certain problems, and also provide a good area for research. Image Processing − GAs are used for various digital image processing (DIP) tasks as well like dense pixel matching. Vehicle routing problems − With multiple soft time windows, multiple depots and a heterogeneous fleet. Scheduling applications − GAs are used to solve various scheduling problems as well, particularly the time tabling problem. Machine Learning − as already discussed, genetics based machine learning (GBML) is a niche area in machine learning. Robot Trajectory Generation − GAs have been used to plan the path which a robot arm takes by moving from one point to another. Parametric Design of Aircraft − GAs have been used to design aircrafts by varying the parameters and evolving better solutions. DNA Analysis − GAs have been used to determine the structure of DNA using spectrometric data about the sample. Multimodal Optimization − GAs are obviously very good approaches for multimodal optimization in which we have to find multiple optimum solutions. Traveling salesman problem and its applications − GAs have been used to solve the TSP, which is a well-known combinatorial problem using novel crossover and packing strategies. Learning working make money