This one is very interesting. Problem description: ------------------------------------------- A common security method used for online banking is to ask the user for three random characters from a passcode. For example, if the passcode was 531278, they may asked for the 2nd, 3rd, and 5th characters; the expected reply would be: 317. The text file, keylog.txt, contains fifty successful login attempts. Given that the three characters are always asked for in order, analyse the file so as to determine the shortest possible secret passcode of unknown length. -------------------------------------------
So solution in C# is this, not optimized, but works (in a 16 seconds):
Definition: --------------------------------------------- If the numbers 1 to 5 are written out in words: one, two, three, four, five, then there are 3 + 3 + 5 + 4 + 4 = 19 letters used in total. If all the numbers from 1 to 1000 (one thousand) inclusive were written out in words, how many letters would be used? NOTE: Do not count spaces or hyphens. For example, 342 (three hundred and forty-two) contains 23 letters and 115 (one hundred and fifteen) contains 20 letters. The use of "and" when writing out numbers is in compliance with British usage.
Problem definition- -------------------------------------------------- Starting with the number 1 and moving to the right in a clockwise direction a 5 by 5 spiral is formed as follows: 21 22 23 24 25 20 7 8 9 10 19 6 1 2 11 18 5 4 3 12 17 16 15 14 13 It can be verified that the sum of both diagonals is 101. What is the sum of both diagonals in a 1001 by 1001 spiral formed in the same way? -------------------------------------------------- C# brute-force solution:
static long SpiralDiagonalsSum(long spiralsize) { long sum = 1; long cursize = 3; long number = 1; while (cursize <= spiralsize) { for (int i = 0; i < 4; i++) { number += cursize - 1; sum += number; } cursize += 2; } return sum; } static void Main(string[] args) { System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch(); sw.Start(); long diagsum = SpiralDiagonalsSum(1001); sw.Stop(); Console.WriteLine("Spiral diagonals sum {0} calculated in {1} ms",diagsum,sw.Elapsed.TotalMilliseconds); Console.ReadLine(); }
This time I will present a branch of evolutionary algorithms - Selfish gene algorithm.
Scope of article
Question arises - Why we need yet another genetic algorithm (GA) ? Well, its always good to have different point of view. It`s because selfish gene algorithm (further- SGA) is a different beast than 'standard' GA. For that reasons I will present SGA and it`s uses in 2 sample problems. For the article being short I will assume reader`s basic knowledge of GA.
Biological background
Several concepts will be used here. Organism genetic makeup is called - genotype. All living organisms has DNA molecules. DNA molecule is composed of genes. Gene location in DNA is called locus. Usually for the same locus there can be different versions of gene. These different versions of gene is called alleles. There can be calculated allele frequency (or probability) in population. In general evolution means, that organisms which succeeds- increases its alleles frequency in population at the expense of children. And vise-versa,- organism which fails- decreases its alleles frequency in population.
Main features of SGA
There are no explicit population in algorithm. Individuals are created and destroyed only temporary for comparision of their genomes. This means that there are no recombination operator at all. Instead new solutions is generated according to their genes frequency in population.
Mutation operator is encoded as random gene selection (not accordingly to frequencies)
We need not to assign absolute value of fitness to generated solution, but instead we need to compare 2 generated solutions between each other. And we need to return back to the algorithm which solution was better.
Pseudo-code of SGA
Generate 2 solutions, according to solution genes frequency/probability in population. 1a. If mutation happens - take random solution gene, not according to its freqency.
Compare these 2 solutions, and decide which is better. 2a. Reward better solution - increase its alleles frequency in population. 2b. Penalyze worse solution - decrease its alleles frequency in population.
Update best solution found if any.
Repeat steps 1-3 until required solution found or other needed criteria is met- for example maximum evolve cycles elapsed or solutions genes frequency is above some threshold or whatever you need.
SGA class diagram
So main SGA classes is 4: SGAgenotype - main class for executing GA algorithm. This class is container for SGAlocus class. SGAlocus - class is container for class SGAallele, also holds info about temporary generated solution genes. SGAlocusGroup is just for possibility of grouping SGAlocus classes, because (you will later see it) it is sometimes usefull to do it. That is acctually SGAgenotype holds SGAlocusGroup class, which in turn groups SGAlocus. SGAallele is class for describing allele, mainly - allele value and its frequency/probability in population. Thats it, its all you need for SGA to run on your PC.
Practical examples of SGA usage
There was 2 cases of usage of SGA.
First - box grouping problem.
Given amount of boxes of some random sizes try to arrange boxes in some area while trying to minimize total boxes intersection area and maximize free unfragmented space left in room. Below is video of how evolves box positions in some number of iterations:
In this type of problem SGAlocusGroup groups 2 loci - one locus for box X position and second locus for box Y position. Total SGAlocusGroup count equals to box count. Alleles encodes all possible values of box positions.
Second - sudoku puzzle generation problem
Given grid size, 9x9 or 16x16, try to generate (if possible) sudoku puzzle or at least something that is close to sudoku puzzle enough :-) . Similarity is measured by duplicating cell colors, that is by error count in grid. Different numbers is represented as different colors. Below is video how grid evolves in some iterations:
In this problem SGAlocusGroup has only one locus, which holds info about cell color. SGAlocusGroup count equals to grid cell count. Alleles encodes all possible colors of cell. In the 9x9 grid case often is that there are left 2-4 color duplicates. In 16x16 case there are much more. But we can adjust evolve cycles to bigger number to overcome these errors.
C# project source files
It is for .NET 2 (VS 2005) but should run in .NET 3.5 also. You can download it here and use as you wish (GNU license):
1. http://www.cad.polito.it/pap/db/sac98.pdf F. Corno, M. Sonza Reorda, G. Squillero, "The Selfish Gene Algorithm: a New Evolutionary Optimization Strategy," SAC98: 13th Annual ACM Symposium on Applied Computing, Atlanta, Georgia (USA), February 1998, pp. 349-355 2. Book "The Selfish Gene", Richard Dawkins, 1976.
This time we will solve euler problem 18 by semi-bruteforce solution. Problem description: -------------------------------------------------------------------------------------- By starting at the top of the triangle below and moving to adjacent numbers on the row below, the maximum total from top to bottom is 23. 3 7 5 2 4 6 8 5 9 3
That is, 3 + 7 + 4 + 9 = 23. Find the maximum total from top to bottom of the triangle below: 75 95 64 17 47 82 18 35 87 10 20 04 82 47 65 19 01 23 75 03 34 88 02 77 73 07 63 67 99 65 04 28 06 16 70 92 41 41 26 56 83 40 80 70 33 41 48 72 33 47 32 37 16 94 29 53 71 44 65 25 43 91 52 97 51 14 70 11 33 28 77 73 17 78 39 68 17 57 91 71 52 38 17 14 91 43 58 50 27 29 48 63 66 04 68 89 53 67 30 73 16 69 87 40 31 04 62 98 27 23 09 70 98 73 93 38 53 60 04 23 ---------------------------------------------------------------------------------------- So solution in C# is more or less short and fast (calculates answer in 1,1 ms). Also this solution is universal and can calculate bigger triangles (lets say 100 rows triangles and more) with a reasonable time.
Let d(n) be defined as the sum of proper divisors of n (numbers less than n which divide evenly into n). If d(a) = b and d(b) = a, where a ≠ b, then a and b are an amicable pair and each of a and b are called amicable numbers.
For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore d(220) = 284. The proper divisors of 284 are 1, 2, 4, 71 and 142; so d(284) = 220.
Evaluate the sum of all the amicable numbers under 10000.
So solution is fast enough,- runs in about 20 ms. A little bit explanations of why this solution is fast:
We need not to search all divisors, but rather until SQRT limit of number. This is because if number, lets say x, have divisor d1 above SQRT(x), then, that number x MUST have also a divisor d2 = x/d1, below SQRT(x) limit.
We need not to verify are all test numbers amicable, because if we find one amicable number, lets say x, then we know that there exists second amicable number y, which is sum of proper divisors of x. This rule is true, because we acctually searching amicable number pairs. So by applying this rule we can use memorization - mark second amicable number in bool array, and skip that number from test later.