gaChromosome

Creates an Imsls_d_chromosome data structure containing unencoded and encoded phenotype information.

Synopsis

gaChromosome ()

Return Value

The function gaChromosome returns an Imsls_d_chromosome data structure, which is required input to gaIndividual, gaPopulation, and gaRandomPopulation. The memory allocated to this data structure can be released using free.

Optional Arguments

t_print, (Input)
By default, summary results are not printed. This option causes the function to print summary results describing the chromosome structure.
binary (Input)

The number of binary phenotypes.

Default: binary = 0.

nominal, int nCategories[] (Input)

The first parameter nNominal is equal to the number of nominal phenotypes. nCategories is an array of length nNominal containing the number of nominal categories for each nominal phenotype. Each value of this array must be at least 2 or greater. If partially matched crossover is used during the genetic algorithm search then the array nCategories is ignored since all of its values are assumed equal to nNominal.

Default: nNominal = 0.

integer, int iIntervals[] , int iBounds[] (Input)

The first parameter nInteger is equal to the number of integer phenotypes. The second parameter in this argument, iIntervals, is a one-dimensional array of length nInteger containing the number of discrete intervals used to map each integer into the chromosome loci. For efficiency, each value in this array should be a power of 2 such as 2, 4, 8, 16, etc. The third parameter, iBounds is an array of size nInteger by 2 containing the lower and upper bounds for each integer phenotype. The lower and upper bounds for the i-th integer phenotype are equal to iBounds[2*i] and iBounds[2 * i+1] respectively. Each integer value submitted to geneticAlgorithm for the i-th integer phenotype, \(\omega\) must conform to the inequality:

\[\mathrm{iBounds}[2i] \leq w_i \leq \mathrm{iBounds}[2i+1]\]

Default: nInteger = 0.

real, int rIntervals[] , float rBounds[] (Input)

The first parameter nReal is equal to the number of real phenotypes. Point values are mapped into chromosome loci using discretization. The second parameter in this argument, rIntervals, is a one-dimensional array of length nReal containing the number of discrete intervals used to map each real value into the chromosome loci. For efficiency, each value in this array should be a power of 2 such as 2, 4, 8, 16, etc. The third parameter, rBounds is an array of size nReal by 2 containing the lower and upper bounds for each integer phenotype. The lower and upper bounds for the i-th real phenotype are equal to rBounds[2*i] and rBounds[2*i+1] respectively. Hence, rBounds[2*i+ 1] must be greater than rBounds[2*i]. Each real value submitted to geneticAlgorithm for the i-th real phenotype, \(\omega_i\), must conform to the inequality:

\[\mathrm{rBounds}[2i] \leq w_i \leq \mathrm{rBounds}[2i+1]\]

Default: nReal = 0.

Description

The genetic algorithm requires a chromosome representation of phenotypes. Most textbook applications of genetic algorithms use phenotypes that have a natural binary encoding. Real world problems often have non-binary phenotypes. Phenotypes are parameters used by the fitness function. Those can include any data type. This function allows for easy encoding of binary, nominal, integer and real phenotypes.

Binary phenotypes are mapped directly into the chromosome as binary bits. Each binary phenotype is treated as a single allele. If the user specifies binary>0, then the first binary bits in the chromosome are allocated for encoding this information. When the fitness function is called during genetic optimization, these bits are translated into zeros and ones and then sent to the fitness function as an integer array of length binary.

Nominal phenotypes are mapped into the chromosome following the binary phenotypes. The number of bits assigned to each nominal phenotype is determined from the number of categories for each nominal phenotype. The value nCategories[i] is equal to the number of categories for the i-th nominal phenotype. The number of bits assigned to this category is the smallest value of k such that \(2^k\)nCategories[i], i.e., \(k=\lceil\log_2 \left( \mathrm{nCategories}[i] \right) \rceil\), where \(\lceil x \rceil\) is the ceiling of x (least integer of x). A binary nominal phenotype would be assigned one bit, and one bit would constitute a single allele. A trinary nominal phenotype would be assigned two bits since \(2^2=4\geq 3\), and these bits would be treated as a single allele.

The mapping of multiple bits to a single allele is a key difference between nominal phenotypes and other phenotypes. Alleles for binary, integer and real phenotypes are represented as single bits in the chromosome. The alleles for nominal phenotypes consist of multiple bits. Since crossover occurs between alleles, crossover for nominal phenotypes is treated differently. This ensures that only viable values for nominal phenotypes result from crossover.

It also means that Gray encoding of individual bits has no effect on nominal phenotypes. For many problems Gray encoding is used instead of standard Base-2 encoding to reduce large changes of encoded phenotype values after crossover. As a result, Gray encoding is never applied to nominal phenotypes.

In addition, partially mixed crossover is only an option for nominal phenotypes. Nominal phenotypes combined with partially mixed crossover make it easy to implement search problems similar to the traveling salesman problem.

Both integer and real phenotypes are discretized. Although this is the most common approach to encoding these phenotypes, some problems may benefit from other forms of encoding. If so, users should provide their own encoding, translating the phenotype into a bit representation that can be mapped into binary phenotypes.

Discretization is controlled by two arrays. For integer phenotypes, the array iIntervals contains the number of discrete intervals used to represent each integer. The number of chromosome bits assigned to the i-th integer is determined by the values in this array. If iIntervals[i]= \(k=2^m\) then the i-th integer phenotype is assigned m bits. For example, if iIntervals[i]= 4, then this phenotype is assigned two bits.

The array iBounds contains the upper and lower bounds for each integer phenotype. The lower bound for the i-th integer phenotype is equal to lb = iBounds[2i], and the upper bound is equal to ub = iBounds[2i+1]. The values for the i-th phenotype, w, must satisfy the inequality lbw < ub. w is discretized to w’ using the formula:

\[w' = \left\lfloor k \frac{(w-lb)}{(ub-lb)}\right\rfloor\]

Where ⌊x⌋ is the floor of x (greatest integer of x). This results in mapping the i-th integer phenotype, w, into one of the integers 0,1,…, k1.

Real phenotypes are handled in the same fashion as integer phenotypes using the values in rIntervals and rBounds.

The number of chromosome bits assigned to each phenotype are described in the following table:

PHENOTYPE NUMBER of BITS
Binary \(\mathit{bbits}=\mathrm{nBinary}\)
Nominal \(\mathit{nbits}=\sum_{i=0}^{\mathrm{nNominal}-1} \lceil\log_2 \left( \mathrm{nCategories[i]} \right) \rceil\)
Integer \(\mathit{ibits}=\sum_{i=0}^{\mathrm{nInteger}-1} \log_2 \left( \mathrm{iIntervals[i]} \right)\)
Real \(\mathit{rbits}=\sum_{i=0}^{\mathrm{nReal}-1} \log_2 \left( \mathrm{rIntervals[i]} \right)\)

See Table 13.4 for a description of the allele values (bits) for the chromosome. Chromosome bits are ordered first by binary phenotypes in bits 0 through bbits ‑1, then nominals, integers and reals in that order.

The memory allocated to this data structure can be released using free.

Example

This example creates a chromosome with 1 binary, 2 nominal, 3 integer and 2 real phenotypes. The t_print argument is used to print a description of the chromosome structure.

from numpy import *
from pyimsl.stat.gaChromosome import gaChromosome
from pyimsl.stat.free import free

n_binary = 1
n_nominal = 2
n_integer = 3
n_real = 2
# number of categories for nominal phenotypes
n_categories = [2, 3]
# number of intervals and boundaries for integer phenotypes
i_intervals = [16, 16, 16]
i_bounds = [[0, 1000], [-10, 10], [-20, 0]]
# number of intervals and boundaries for real phenotypes
r_intervals = [512, 1024]
r_bounds = [[0.0, 20.0], [-20.0, 20.0]]
integer = {"iIntervals": i_intervals, "iBounds": i_bounds}
real = {"rIntervals": r_intervals, "rBounds": r_bounds}
# Chromosome Data Types

chromosome = gaChromosome(binary=n_binary,
                          nominal=n_categories,
                          integer=integer,
                          real=real,
                          t_print=True)

Output

The t_print option produced the following description of the chromosome. The data structure uses 304 bytes. The chromosome has 34 alleles. The first bit is used to represent the binary phenotype.

The next two alleles are assigned to the nominal phenotypes. The first phenotype will be encoded in allele 1 with the integers zero and one since it has only two categories. The second nominal phenotype has 3 categories. It will be encoded with the integers zero, one, and two.

The integer phenotypes are each assigned 4 binary bits. Since the number of intervals is the same for each integer, 16, 4 bits will be used to encode the integers 0-15. If Base-2 encoding is used, the \(16^{th}\) interval will be encoded as \(15=\{1111\}\).

The first real phenotype uses 512 intervals to discretize its value. This is encoded using 9 alleles. The second real phenotype uses 1024 intervals to discretize its value. This requires 10 alleles to properly represent the values 0 to 1023.

*******************************
**** CHROMOSOME STRUCTURE *****
Data Structure length:  352 Bytes
Chromosome length:       34 Bits

******ALLELE ASSIGNMENTS*******
Binary:    0 -   0 n_binary = 1
Nominal:   1 -   2 n_nominal= 2
Integer:   3 -  14 n_integer= 3
Real:     15 -  33 n_real   = 2
*******************************

NOMINAL CATEGORIES*************
   Variable  0:    2 categories
   Variable  1:    3 categories
*******************************

INTEGER BOUNDS*****************
   Variable  0: [    0,  1000]
   Variable  1: [  -10,    10]
   Variable  2: [  -20,     0]
*******************************

INTEGER BITS*******************
   Variable  0:    4 bits
   Variable  1:    4 bits
   Variable  2:    4 bits
*******************************

INTEGER DISCRETE INTERVALS*****
   Variable  0:   16 intervals
   Variable  1:   16 intervals
   Variable  2:   16 intervals
*******************************

REAL BOUNDS********************
   Variable  0: [0,20]
   Variable  1: [-20,20]
*******************************

REAL BITS**********************
   Variable  0:    9 bits
   Variable  1:   10 bits
*******************************

REAL DISCRETE INTERVALS********
   Variable  0:  512 intervals
   Variable  1: 1024 intervals
*******************************