Genetic Algorithms – An Overview
Genetic algorithms are increasingly popular for solving optimization, search and machine learning problems. The analog between optimizing a fitness function and biological processes of natural selection and genetics is generally attributed to John H. Holland and his students at the University of Michigan. His landmark publication “Adaptation in Natural and Artificial Systems” (Holland, 1975) sparked wide ranging investigations into his approach in a variety of areas ranging from science and engineering to business.
This genetic algorithm implementation supports Holland’s basic algorithm with most popular variations. This is achieved by supporting:
1. User defined population size and selection method including roulette, remainder, tournament and stochastic universal sampling both with and without replacement,
2. Random or user defined initial populations,
3. Any combination of four different data types: nominal, binary, integer and real,
4. Base 2 and Gray encoding and decoding of integer and real data types,
5. Automatic encoding and decoding of chromosome information into phenotypes,
6. User specified number of crossover points and three different options for crossover: standard, inversion and partially matched crossover,
7. Elitism to ensure fittest individuals are retained between generations,
8. User supplied fitness functions with or without additional function parameters,
9. User defined crossover and mutation probabilities,
10. Linear and sigma fitness scaling,
11. Customized and predetermined stopping criteria,
12. Measures of algorithm convergence and efficiency – velocity, on-line and off-line fitness.
Data Structures
Alleles
The genetic encoding of a real or artificial organism is contained within their chromosomes. Each chromosome consists of a large number of genes, each uniquely located on the chromosome. Each gene in turn is composed of several alleles. In artificial organisms, i.e. genetic algorithms, an allele is encoded with discrete values.
The original simple genetic algorithm encoded alleles as either zero or one, represented by a single computer bit. This algorithm uses the same encoding for binary, integer and real phenotype values. In addition, users can specify nominal phenotypes which can use any non-negative value. This expands the basic genetic algorithm to include search domains with any number of symbols encoded as nominal phenotypes.
Each nominal phenotype is encoded into a single non-negative integer. Integer phenotypes, on the other hand, are encoded into a binary representation using either Base-2 or Gray encoding.
The crossover operation in
imsls_f_genetic_algorithm handles a wide variety of allele encoding. Users define their allele encoding using single or multiple bits or a combination. In
imsls_f_genetic_algorithm nominal, binary, integer and real phenotypes can be defined with any number of crossover points. The crossover and mutation probabilities can be specified. In addition, inversion can be specified for any phenotype and partially matched crossover can be automatically invoked for nominal phenotypes.
This large variety of data types, encoding and crossover options allows users to solve a wide range of search and optimization problems using
imsls_f_genetic_algorithm.
Chromosomes
In natural systems, chromosomes consist of thousands of genes, each encoded using alleles. In artificial systems, chromosomes are strings of alleles. The relationship between phenotype values and the chromosome allele data structure is created using
imsls_f_ga_chromosome.
The chromosome data structure for an individual consists of an integer array representing the alleles, and additional information needed for encoding and decoding nominal, integer and real phenotype values into the allele. This information is used for implementing automatic Base-2 and Gray encoding and differentiating between nominal phenotypes requiring partially matched crossover and other classes of nominal phenotypes.
A detailed description of the
Imsls_f_chromosome data structure is given in
Table 13.46. The data structure not only contains the chromosome information encoded as an integer array of alleles, it also contains phenotype values. By default, information in the allele array is automatically decoded into phenotypes. This behavior can be suppressed using the
IMSLS_NO_DECODE option in the
imsls_f_genetic_algorithm function.
Table 13.46 — The Imsls_f_chromosome Data Structure
Parameter | Data Type | Description |
total_length | int | Total number of bytes allocated to the data structure. |
c_length | int | The length of the allele array. |
allele | int[] | An array of of length c_length containing the allele values (bits) for the chromosome. |
n_binary | int | The number of binary phenotypes. |
n_nominal | int | The number of nominal phenotypes. |
n_integer | int | The number of integer phenotypes. |
n_real | int | The number of real phenotypes. |
n_intBits | int | The total number of bits in allele used to represent the integer phenotypes. |
n_realBits | int | The total number of bits in allele used to represent the real phenotypes. |
binaryIndex | int | The index of the first bit in allele used to represent the binary phenotypes. |
nominalIndex | int | The index of the first bit in allele used to represent the nominal phenotypes. |
integerIndex | int | The index of the first bit in allele used to represent the integer phenotypes. |
realIndex | int | The index of the first bit in allele used to represent the real phenotypes. |
n_categories | int[] | An array of length n_nominal containing the maximum number of categories for each nominal phenotype. |
i_intervals | int[] | An array of length n_integer containing the number of discrete intervals used to represent each integer phenotype. |
i_bits | int[] | An array of length n_integer containing the number of bits in the allele array assigned to each integer phenotype. |
i_bounds | int[] | An array of size n_integer by 2 containing rows of lower and upper limits for each integer phenotype. |
r_intervals | int[] | An array of length n_real containing the number of discrete intervals used to represent each real phenotype. |
r_bits | int[] | An array of length n_real containing the number of bits in the allele array assigned to each real phenotype. |
r_bounds | float[] | An array of size n_real by 2 containing rows of lower and upper limits for each real phenotype. |
Individuals
An individual consists of an expressed chromosome for the individual. By default the data structure for individuals also contains decoded values for all phenotypes. This allows users to program their fitness function to use phenotype values instead of their encoded allele representation.
A phenotype is the expression of a collection of genes. In organisms, this expression includes physical characteristics, such as eye color, and behavior. In artificial systems, a phenotype is generally thought of as an attribute. For function optimization problems phenotypes might be floating points or integer values. Phenotypes in a search problem might include nominal or binary encoded information about the search space.
Phenotypes are encoded into the chromosome allele as groups of bits. Later, when the fitness function is evaluated, the algorithm decodes the bits in these groups into their phenotype values. By default this is Base-2 encoding, but Gray encoding can be declared in the
imsls_f_ga_individual,
imsls_f_ga_random_population and
imsls_f_genetic_algorithm functions. Support is provided for mapping integer and real values into allele encoding using discretization and either Base-2 or Gray encoding.
Traditional Base-2 encoding of integer and floating point phenotypes can produce binary representations with widely different representations for phenotypes with similar values. Adjacent integral values encoded using Gray’s mapping differ by only one bit. For example, in binary, the numbers 15 and 16 have very different representations: 15=“01111” and 16=“10000”. The Gray encoded values for this number are closer, differing by only a single bit: 15=“01000” and 16=“11000”.
Although the majority of applications discretize integer and real phenotypes and then encode them using either Base-2 or Gray encoding, other encoding methods can be implemented by incorporating phenotype encoding and decoding into the fitness function.
Decoding of chromosome information into its associated phenotypes can be suppressed using the
IMSLS_NO_DECODING argument in
imsls_f_genetic_algorithm. In that case the phenotype values in the
Imsls_f_individual data structure will not be updated with every crossover. They are only decoded for the final generation. Encoding can be either Base-2 or Gray. Base-2 is the default, but Gray encoding can be invoked using the
IMSLS_GRAY argument in
imsls_f_ga_individual,
imsls_f_ga_random_population or
imsls_f_genetic_algorithm.
Table 13.47 describes the contents of the
Imsls_f_individual data structure.
Table 13.47 — The Imsls_f_individual Data Structure
Parameter | Data Type | Description |
encoding | int | Controls encoding of real and integer phenotypes. Encoding is either Base-2, the default, or Gray invoked using the optional IMSLS_GRAY argument with imsls_f_genetics_algorithm. |
total_length | int | Total number of bytes allocated to the data structure. |
nominalPhenotype | int[] | An array of integers of length chromosome->n_nominal containing the values of the nominal phenotypes. |
binaryPhenotype | int[] | An array of integers of length chromosome->n_binary containing the values of the binary phenotypes. |
integerPhenotype | int[] | An array of integers of length chromosome->n_integer containing the values of the integer phenotypes. |
realPhenotype | float[] | An array of floating point values of length chromosome->n_real containing the values of the real phenotypes. |
chromosome | Imsls_f_chromosome* | A pointer to the chromosome data structure for this individual. |
Population
A population is a collection of individuals. A genetic algorithm operates on a population, transforming it from one generation to the next using rules including selection, reproduction, crossover and mutation. A population is described by the chromosome and individual data structures and the number of its members.
The initial population can be created randomly using
imsls_f_ga_random_population, or it can be created from a user specified set of individuals using
imsls_f_ga_population. Both of these functions return an
Imsls_f_population data structure, which is required input to
imsls_f_genetic_algorithm.
Table 13.48 describes the
Imsls_f_population data structure.
Table 13.48 — The Imsls_f_population data structure
Parameter | Data Type | Description |
n | int | The number of individuals in the population. |
indexFittest | int | The index in individual of the fittest individual within the population. |
indexWeakest | int | The index in individual of the weakest individual within the population. |
avgFitness | float | The average fitness for the population. |
stdFitness | float | The standard deviation of the fitness for the population. |
maxFitness | float | The maximum fitness of the population. |
minFitness | float | The minimum fitness of the population. |
fitness | float* | An array of the fitness values for each individual in the population. |
chromosome | Imsls_f_chromosome* | A pointer to the chromosome data structure for the individuals in the population. |
individual | Imsls_f_individual** | A pointer to an array of size n containing the individuals of the population. |
Note that the fitness values in this data structure are only initialized if the fitness function is passed to the
imsls_f_ga_population or
imsls_f_ga_random_population. Upon completion,
imsls_f_genetic_algorithm updates these parameters to the values associated with the last generation.
Fitness and Penalty Functions
The genetic algorithm is designed to find the phenotype that maximizes the fitness function. This is a user supplied function that describes the fitness of a particular phenotype. With each succeeding generation, the genetic algorithm transforms a population into better performing individuals as defined by the fitness function.
The fitness function is a required argument to
imsls_f_genetic_algorithm. Phenotype restrictions other than simple lower and upper value boundaries are handled by incorporating a penalty function into the fitness calculation.
The optional argument IMSLS_FITNESS_FCN_WITH_PARMS allows users to have the algorithm pass individuals and parameters to the fitness function. This provides the flexibility to program a single fitness function that can be applied to a wider variety of applications.
The Genetic Algorithm
There are many variations of the original simple genetic algorithm described by Holland (1975). Many of these were developed for particular applications or data types.
imsls_f_genetic_algorithm implements both the simple algorithm as well as more advanced variations. It has also been designed to provide advanced users the flexibility to provide their own initial populations, stopping criteria, and phenotype encoding and decoding.
Once an initial population is constructed, the genetic algorithm finds a solution to the search or optimization problem using five basic operations to evolve the population from one generation to the next: selection, reproduction, crossover, mutation and fitness.
Selection
Selection is the process used to select individuals for reproduction to create the next generation. This is driven by a fitness function that makes higher fitness individuals more likely to be selected for creating the next generation.
Optimum selection of individuals for reproduction is important to the efficiency and convergence of a genetics algorithm. Many methods have been researched.
imsls_f_genetic_algorithm implements the following variations: deterministic selection, roulette wheel selection with and without replacement, remainder selection with and without replacement, SUS selection, rank selection and two forms of tournament selection. Each of these can be employed with fitness scaling and elitism.
Fitness scaling is not required, but there are two options available: linear scaling and sigma scaling. See
IMSLS_LINEAR_SCALING and
IMSLS_SIGMA_SCALING.
Reproduction and Crossover
After individuals are selected, reproduction involves crossing the individual’s chromosomes to produce their offspring’s chromosome. In the simple case, this involves exchanging genetic information by swapping bits within the parent’s chromosome.
Crossover is a random process. It is controlled by the optional arguments
IMSLS_CROSSOVERS and
IMSLS_CROSSOVER_PROB in
imsls_f_genetic_algorithm. Not all parents selected for reproduction are mated. Most genetic algorithms use a crossover probability in the range of 0.6 to 0.9. The
IMSLS_CROSSOVER_PROB argument allows users to select any crossover probability between 0 and 1.
Traditionally chromosomes are crossed at a single point. However, some problems benefit from using more crossover points. The IMSLS_CROSSOVERS argument allows users to select any number of crossover points.
Once two parents are selected for crossover and their crossover points are defined, a genetic algorithm proceeds to develop a new offspring by alternately mapping alleles from the two chromosomes, swapping the source of the alleles at each crossover point.
For most applications, this creates a new offspring with a non-zero fitness value. However, for some applications, such as the traveling salesman problem, the offspring produced by this simple crossover operation will likely be infeasible. For these problems partially matched crossover and inversion crossover have been developed to ensure that the resulting offspring is a feasible solution.
Partially matched and inversion crossover are invoked using the
IMSLS_PMX_CROSSOVER and
IMSLS_INVERT_CROSSOVER optional arguments in
imsls_f_genetic_algorithm.
Mutation
Mutation stochastically switches allele settings using the mutation probability set with
IMSLS_MUTATION_PROB in
imsls_f_genetic_algorithm. Most applications set the mutation probability to a value in the range 0.01 to 0.001. The
IMSLS_MUTATION_PROB argument accepts any probability between 0 and 1. However, high mutation rates cause the genetic algorithm to perform similar to a random search.
For users who prefer to replace
imsls_f_genetic_algorithm with their own algorithm, the function
imsls_f_ga_mutate can be used for mutation. Decoding of the resulting chromosome into phenotype values can be achieved using
imsls_f_ga_decode.
This traditional mutation operator can produce infeasible solutions for some problems. In those cases, swap mutation is used. That is, instead of inverting a single allele value, two alleles are randomly swapped within the nominal portion of the chromosome. This allows mutation to proceed with search problems such as the traveling salesman problem.
Fitness and Phenotype Constraints
The fitness function is a required argument to
imsls_f_genetic_algorithm. The genetic algorithm function applies the fitness function to each new individual. It must be scaled to return a non-negative value.
Higher fitness values represent higher performing individuals. Constraints on integer and real phenotypes can be handled by setting lower and upper bounds. Additional constraints for these phenotypes and others should be incorporated using a penalty calculation in the fitness function.
Artificial Populations
A critical step in applying genetic algorithms to a search or optimization problem is creating a population of artificial organisms and their fitness function.
Mapping Phenotypes into Chromosomes
Most applications of genetic algorithms for search and optimization involve binary, nominal, integer or real phenotypes. Most introductions to genetic algorithms describe applications involving the use of simple binary phenotypes, making it easier to focus on the algorithm operations. Binary phenotypes make it possible to implement basic applications and allow users to develop their own phenotype encoding when default encodings are insufficient.
In most applications, integer and real phenotypes are encoded into chromosome bits by mapping their values into a discrete representation. Users specify upper and lower bounds for these phenotypes as well as the number of discrete intervals used for their encoding.
Nominal phenotypes are treated differently from integer phenotypes. Integer phenotypes use chromosome bits as alleles. Nominal phenotypes use groups of bits as alleles. This allows symbolic chromosome representations other than binary. Search problems such as the traveling salesman problem are best represented using nominal phenotypes with partially mixed crossover rather than binary or integer phenotypes.
Information about the nature of the phenotypes and their chromosome encoding is encapsulated in the
Imsls_f_chromosome data structure created by
imsls_f_ga_chromosome.
Describing Individuals and the Population
An individual is described by their expressed chromosome, phenotypes and parentage information. Chromosome information is encapsulated into an
Imsls_f_chromosome data structure. Individuals are represented by the
Imsls_f_individual data structure, which can be automatically created using
imsls_f_ga_random_population or systematically specified using
imsls_f_ga_individual and
imsls_f_ga_population.
Typically users create an initial population of 20 to 100 individuals, depending on the length of the chromosome.
Selection
Genetic algorithms support a large variety of methods for selecting population individuals for reproduction. The initial population is either randomly selected or systematically specified using
imsls_f_ga_random_population or
imsls_f_ga_population with
imsls_f_ga_individual, respectively.
Selection between generations can be done using a variety of approaches based upon individual fitness. The most common approach is stochastic selection with replacement based upon the individual’s fitness. Holland (1975) also referred to this as roulette wheel selection with replacement. Under this approach, individuals with higher fitness have a higher probability of selection. The roulette wheel selection works well when the distribution of fitness across the population is not dominated by the high fitness of a few individuals.
If the population includes a few high fitness individuals, then stochastic selection without replacement can work better than selection with replacement. When selection without replacement is used, an individual cannot be selected more than once per generation for reproduction. Effectively, this ensures that the individuals in the next generation are not generated from just a few, high fitness parents.
In addition to stochastic selection with and without replacement,
imsls_f_genetic_algorithm also supports deterministic, remainder, tournament and stochastic universal sampling.
Reproduction and Crossover
Reproduction involves selection and crossover using a selection and crossover model. Standard, partially matched and inversion crossover can be selected.
Mutation
Mutation is the stochastic process applied to chromosome bits after crossover. Standard mutation examines each bit and determines whether it should be changed. The probability that a bit is changed is controlled by the mutation probability set using the optional argument
IMSLS_MUTATION_PROB with
imsls_f_genetic_algorithm.
When partially matched crossover (PMX) is used with nominal phenotypes, the standard mutation algorithm can result in infeasible offspring. When PMX is employed the mutation algorithm is automatically changed. Instead of switching individual bits, two are randomly selected and swapped. The probability they are swapped is controlled by the mutation probability.
Since the mutation probability is generally in the range 0.001 to 0.01, mutation occurs infrequently. Still it plays a key role in halting premature convergence due to early domination by a few fit individuals resulting in a loss of diversity.