Chapter 13: Data Mining > Artificial Populations

Artificial Populations

A critical step in applying genetic algorithms to a search or optimization problem is creating a population of artificial organisms and their fitness function.

Mapping Phenotypes into Chromosomes

Most applications of genetic algorithms for search and optimization involve binary, nominal, integer or real phenotypes.  Most introductions to genetic algorithms describe applications involving the use of simple binary phenotypes, making it easier to focus on the algorithm operations.  Binary phenotypes make it possible to implement basic applications and allow users to develop their own phenotype encoding when default encodings are insufficient.

In most applications, integer and real phenotypes are encoded into chromosome bits by mapping their values into a discrete representation.  Users specify upper and lower bounds for these phenotypes as well as the number of discrete intervals used for their encoding.

Nominal phenotypes are treated differently from integer phenotypes.  Integer phenotypes use chromosome bits as alleles.  Nominal phenotypes use groups of bits as alleles.  This allows symbolic chromosome representations other than binary.  Search problems such as the traveling salesman problem are best represented using nominal phenotypes with partially mixed crossover rather than binary or integer phenotypes.

Information about the nature of the phenotypes and their chromosome encoding is encapsulated in the Imsls_f_chromosome data structure created by imsls_f_ga_chromosome.

Describing Individuals and the Population

An individual is described by their expressed chromosome, phenotypes and parentage information.  Chromosome information is encapsulated into an Imsls_f_chromosome data structure.  Individuals are represented by the Imsls_f_individual data structure, which can be automatically created using imsls_f_ga_random_population or systematically specified using imsls_f_ga_individual and imsls_f_ga_population.

Typically users create an initial population of 20 to 100 individuals, depending on the length of the chromosome.

Selection

Genetic algorithms support a large variety of methods for selecting population individuals for reproduction.  The initial population is either randomly selected or systematically specified using imsls_f_ga_random_population or imsls_f_ga_population with imsls_f_ga_individual, respectively.

Selection between generations can be done using a variety of approaches based upon individual fitness.  The most common approach is stochastic selection with replacement based upon the individual’s fitness.  Holland (1975) also referred to this as roulette wheel selection with replacement.  Under this approach, individuals with higher fitness have a higher probability of selection.  The roulette wheel selection works well when the distribution of fitness across the population is not dominated by the high fitness of a few individuals.

If the population includes a few high fitness individuals, then stochastic selection without replacement can work better than selection with replacement.  When selection without replacement is used, an individual cannot be selected more than once per generation for reproduction.  Effectively, this ensures that the individuals in the next generation are not generated from just a few, high fitness parents.

In addition to stochastic selection with and without replacement, imsls_f_genetic_algorithm also supports deterministic, remainder, tournament and stochastic universal sampling.

Reproduction and Crossover

Reproduction involves selection and crossover using a selection and crossover model.  Standard, partially matched and inversion crossover can be selected. 

Mutation

Mutation is the stochastic process applied to chromosome bits after crossover.  Standard mutation examines each bit and determines whether it should be changed.  The probability that a bit is changed is controlled by the mutation probability set using the optional argument IMSLS_MUTATION_PROB with imsls_f_genetic_algorithm.

When partially matched crossover (PMX) is used with nominal phenotypes, the standard mutation algorithm can result in infeasible offspring.  When PMX is employed the mutation algorithm is automatically changed.  Instead of switching individual bits, two are randomly selected and swapped.  The probability they are swapped is controlled by the mutation probability. 

Since the mutation probability is generally in the range 0.001 to 0.01, mutation occurs infrequently.  Still it plays a key role in halting premature convergence due to early domination by a few fit individuals resulting in a loss of diversity.

 


RW_logo.jpg
Contact Support