Friday, April 18, 2008

Selfish Gene Algorithm

This time I will present a branch of evolutionary algorithms - Selfish gene algorithm.

Scope of article
Question arises - Why we need yet another genetic algorithm (GA) ? Well, its always good to have different point of view. It`s because selfish gene algorithm (further- SGA) is a different beast than 'standard' GA. For that reasons I will present SGA and it's uses in 2 sample problems. For the article being short I will assume reader`s basic knowledge of GA.

Biological background

Several concepts will be used here. Organism genetic makeup is called - genotype. All living organisms has DNA molecules. DNA molecule is composed of genes. Gene location in DNA is called locus. Usually for the same locus there can be different versions of gene. These different versions of gene is called alleles. There can be calculated allele frequency (or probability) in population. In general evolution means, that organisms which succeeds- increases its alleles frequency in population at the expense of children. And vise-versa,- organism which fails- decreases its alleles frequency in population.

Main features of SGA
• There are no explicit population in algorithm. Individuals are created and destroyed
only temporary for comparision of their genomes. This means that there are no recombination operator at all. Instead new solutions is generated according to their genes frequency in population.
• Mutation operator is encoded as random gene selection (not accordingly to frequencies)
• We need not to assign absolute value of fitness to generated solution, but instead we
need to compare 2 generated solutions between each other. And we need to return back to the algorithm which solution was better.
Pseudo-code of SGA
1. Generate 2 solutions, according to solution genes frequency/probability in population.
1a. If mutation happens - take random solution gene, not according to its freqency.
2. Compare these 2 solutions, and decide which is better.
2a. Reward better solution - increase its alleles frequency in population.
2b. Penalyze worse solution - decrease its alleles frequency in population.
3. Update best solution found if any.
4. Repeat steps 1-3 until required solution found or other needed criteria is met-
for example maximum evolve cycles elapsed or solutions genes frequency is above some threshold or whatever you need.
SGA class diagram

So main SGA classes is 4:

SGAgenotype
- main class for executing GA algorithm. This class is container for SGAlocus class. SGAlocus - class is container for class SGAallele, also holds info about temporary generated solution genes.
SGAlocusGroup is just for possibility of grouping SGAlocus classes, because (you will later see it) it is sometimes usefull to do it. That is acctually SGAgenotype holds SGAlocusGroup class, which in turn groups SGAlocus.
SGAallele is class for describing allele, mainly - allele value and its frequency/probability in population.
Thats it, its all you need for SGA to run on your PC.
Practical examples of SGA usage
There was 2 cases of usage of SGA.
First - box grouping problem.
Given amount of boxes of some random sizes try to arrange boxes in some area while trying to minimize total boxes intersection area and maximize free unfragmented space left in room.
Below is video of how evolves box positions in some number of iterations:

In this type of problem SGAlocusGroup groups 2 loci - one locus for box X position and second locus for box Y position. Total SGAlocusGroup count equals to box count. Alleles encodes all possible values of box positions.
Second - sudoku puzzle generation problem
Given grid size, 9x9 or 16x16, try to generate (if possible) sudoku puzzle or at least something that is close to sudoku puzzle enough :-) . Similarity is measured by duplicating cell colors, that is by error count in grid. Different numbers is represented as different colors.
Below is video how grid evolves in some iterations:

In this problem SGAlocusGroup has only one locus, which holds info about cell color.
SGAlocusGroup count equals to grid cell count. Alleles encodes all possible colors of cell.
In the 9x9 grid case often is that there are left 2-4 color duplicates. In 16x16 case there are much more. But we can adjust evolve cycles to bigger number to overcome these errors.
C# project source files
It is for .NET 2 (VS 2005) but should run in .NET 3.5 also. You can download it here and use as you wish (GNU license):
Last- useful references
F. Corno, M. Sonza Reorda, G. Squillero, "The Selfish Gene Algorithm: a New Evolutionary Optimization Strategy," SAC98: 13th Annual ACM Symposium on Applied Computing, Atlanta, Georgia (USA), February 1998, pp. 349-355
2. Book "The Selfish Gene", Richard Dawkins, 1976.