Tuesday, April 29, 2008

Project Euler p79 - cracking passcode

This one is very interesting. Problem description:
-------------------------------------------
A common security method used for online banking is to ask the user for three random characters from a passcode. For example, if the passcode was 531278, they may asked for the 2nd, 3rd, and 5th characters; the expected reply would be: 317.
The text file, keylog.txt, contains fifty successful login attempts.
Given that the three characters are always asked for in order, analyse the file so as to determine the shortest possible secret passcode of unknown length.
-------------------------------------------

So solution in C# is this, not optimized, but works (in a 16 seconds):


static int[] ReadKeyLog(string file)
{
int[] dig;
string buf;
string[] barr;
StreamReader sr = new StreamReader(file);
buf = sr.ReadToEnd().TrimEnd("\n".ToCharArray());
barr = buf.Split("\n".ToCharArray());
dig = Array.ConvertAll(barr, s => { return Int32.Parse(s); });
sr.Close();
return dig;
}

static bool KeyPartExists(long testnumber, int keypart)
{
int n1 = keypart / 100;
int n2 = (keypart / 10) % 10;
int n3 = keypart % 10;
bool n1e = false;
bool n2e = false;
bool n3e = false;

for (long i = testnumber; i > 0; i /= 10)
{
if (!n3e)
n3e = (i % 10) == n3;
else
if (!n2e)
n2e = (i % 10) == n2;
else
if (!n1e)
n1e = (i % 10) == n1;
else
break;
}
if (n1e && n2e && n3e)
return true;
else
return false;
}

static bool HasAllKeyParts(long testnum, int[] keylog)
{
bool res = true;

for (int i = 0; i < keylog.Length; i++)
{
if (!KeyPartExists(testnum, keylog[i]))
{
res = false;
break;
}
}
return res;
}

static long FindPassCode(string file)
{
int[] keylog = ReadKeyLog(file);
long num = 10000;

while (true)
{
if (HasAllKeyParts(num, keylog))
return num;
num++;
}
}

static void Main(string[] args)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
Console.WriteLine("Bruteforced passcode is {0}",
FindPassCode("C:\\ProjectEuler\\keylog.txt")
);
sw.Stop();
Console.WriteLine("Cracking time {0} seconds", sw.Elapsed.Seconds);
Console.ReadLine();
}

Thursday, April 24, 2008

Project Euler Problem 17

Definition:
---------------------------------------------
If the numbers 1 to 5 are written out in words: one, two, three, four, five, then there are 3 + 3 + 5 + 4 + 4 = 19 letters used in total.
If all the numbers from 1 to 1000 (one thousand) inclusive were written out in words, how many letters would be used?
NOTE: Do not count spaces or hyphens. For example, 342 (three hundred and forty-two) contains 23 letters and 115 (one hundred and fifteen) contains 20 letters. The use of "and" when writing out numbers is in compliance with British usage.

---------------------------------------------

C# code solution:

static long LettersOfNumber(long num, long[,] bases)
{
long count = 0;
long tempnum;
if (num > 1000 num < 1)
count = -1;
else if (num == 1000)
count = 11;
else
{
tempnum = num % 100;
if (tempnum < 20)
count += bases[tempnum, 0];
else
count += bases[num % 10, 0] + bases[(num / 10) % 10, 1];
if (num > 99)
count += bases[(num / 100), 0] + 7 + ((tempnum == 0) ? 0 : 3);
}
return count;
}
static long LettersUntilNumber(int num)
{
long letcount = 0;
long[,] bas = new long[20, 2] { { 0, 0 }, { 3, 0 }, { 3, 6 }, { 5, 6 }, { 4, 5 }, { 4, 5 }, { 3, 5 }, { 5, 7 }, { 5, 6 }, { 4, 6 }, { 3, 0 }, { 6, 0 }, { 6, 0 }, { 8, 0 }, { 8, 0 }, { 7, 0 }, { 7, 0 }, { 9, 0 }, { 8, 0 }, { 8, 0 } };
for (int i = 1; i <= num; i++)
letcount += LettersOfNumber(i, bas);
return letcount;
}
static void Main(string[] args)
{
Console.WriteLine(LettersUntilNumber(1000));
Console.ReadLine();
}

Saturday, April 19, 2008

Project Euler Problem 28

Problem definition-
--------------------------------------------------
Starting with the number 1 and moving to the right in a clockwise direction a 5 by 5 spiral is formed as follows:
21 22 23 24 25
20 7 8 9 10
19 6 1 2 11
18 5 4 3 12
17 16 15 14 13
It can be verified that the sum of both diagonals is 101.
What is the sum of both diagonals in a 1001 by 1001 spiral formed in the same way?
--------------------------------------------------
C# brute-force solution:

static long SpiralDiagonalsSum(long spiralsize)
{
long sum = 1;
long cursize = 3;
long number = 1;
while (cursize <= spiralsize)
{
for (int i = 0; i < 4; i++)
{
number += cursize - 1;
sum += number;
}
cursize += 2;
}
return sum;
}
static void Main(string[] args)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
long diagsum = SpiralDiagonalsSum(1001);
sw.Stop();
Console.WriteLine("Spiral diagonals sum {0} calculated in {1} ms",diagsum,sw.Elapsed.TotalMilliseconds);
Console.ReadLine();
}


Solution on core duo performs in a 0.2 ms.

Friday, April 18, 2008

Selfish Gene Algorithm

This time I will present a branch of evolutionary algorithms - Selfish gene algorithm.

Scope of article

Question arises - Why we need yet another genetic algorithm (GA) ? Well, its always good to have different point of view. It`s because selfish gene algorithm (further- SGA) is a different beast than 'standard' GA. For that reasons I will present SGA and it`s uses in 2 sample problems. For the article being short I will assume reader`s basic knowledge of GA.

Biological background

Several concepts will be used here. Organism genetic makeup is called - genotype. All living organisms has DNA molecules. DNA molecule is composed of genes. Gene location in DNA is called locus. Usually for the same locus there can be different versions of gene. These different versions of gene is called alleles. There can be calculated allele frequency (or probability) in population. In general evolution means, that organisms which succeeds- increases its alleles frequency in population at the expense of children. And vise-versa,- organism which fails- decreases its alleles frequency in population.

Main features of SGA
  • There are no explicit population in algorithm. Individuals are created and destroyed
    only temporary for comparision of their genomes. This means that there are no recombination operator at all. Instead new solutions is generated according to their genes frequency in population.
  • Mutation operator is encoded as random gene selection (not accordingly to frequencies)
  • We need not to assign absolute value of fitness to generated solution, but instead we
    need to compare 2 generated solutions between each other. And we need to return back to the algorithm which solution was better.
Pseudo-code of SGA
  1. Generate 2 solutions, according to solution genes frequency/probability in population.
    1a. If mutation happens - take random solution gene, not according to its freqency.
  2. Compare these 2 solutions, and decide which is better.
    2a. Reward better solution - increase its alleles frequency in population.
    2b. Penalyze worse solution - decrease its alleles frequency in population.
  3. Update best solution found if any.
  4. Repeat steps 1-3 until required solution found or other needed criteria is met-
    for example maximum evolve cycles elapsed or solutions genes frequency is above some threshold or whatever you need.
SGA class diagram



So main SGA classes is 4:

SGAgenotype
- main class for executing GA algorithm. This class is container for SGAlocus class. SGAlocus - class is container for class SGAallele, also holds info about temporary generated solution genes.
SGAlocusGroup is just for possibility of grouping SGAlocus classes, because (you will later see it) it is sometimes usefull to do it. That is acctually SGAgenotype holds SGAlocusGroup class, which in turn groups SGAlocus.
SGAallele is class for describing allele, mainly - allele value and its frequency/probability in population.
Thats it, its all you need for SGA to run on your PC.

Practical examples of SGA usage

There was 2 cases of usage of SGA.

First - box grouping problem.

Given amount of boxes of some random sizes try to arrange boxes in some area while trying to minimize total boxes intersection area and maximize free unfragmented space left in room.
Below is video of how evolves box positions in some number of iterations:

video
In this type of problem SGAlocusGroup groups 2 loci - one locus for box X position and second locus for box Y position. Total SGAlocusGroup count equals to box count. Alleles encodes all possible values of box positions.

Second - sudoku puzzle generation problem

Given grid size, 9x9 or 16x16, try to generate (if possible) sudoku puzzle or at least something that is close to sudoku puzzle enough :-) . Similarity is measured by duplicating cell colors, that is by error count in grid. Different numbers is represented as different colors.
Below is video how grid evolves in some iterations:

video
In this problem SGAlocusGroup has only one locus, which holds info about cell color.
SGAlocusGroup count equals to grid cell count. Alleles encodes all possible colors of cell.
In the 9x9 grid case often is that there are left 2-4 color duplicates. In 16x16 case there are much more. But we can adjust evolve cycles to bigger number to overcome these errors.

C# project source files

It is for .NET 2 (VS 2005) but should run in .NET 3.5 also. You can download it here and use as you wish (GNU license):

http://coding-experiments.googlecode.com/files/SelfishGeneAlgorithm.zip

Last- useful references

1. http://www.cad.polito.it/pap/db/sac98.pdf
F. Corno, M. Sonza Reorda, G. Squillero, "The Selfish Gene Algorithm: a New Evolutionary Optimization Strategy," SAC98: 13th Annual ACM Symposium on Applied Computing, Atlanta, Georgia (USA), February 1998, pp. 349-355
2. Book "The Selfish Gene", Richard Dawkins, 1976.

Additional helper links-
http://en.wikipedia.org/wiki/Genotype
http://en.wikipedia.org/wiki/Locus_(genetics)
http://en.wikipedia.org/wiki/Allele

Thats all, have fun !!!

Saturday, April 12, 2008

Project Euler Problem 18

This time we will solve euler problem 18 by semi-bruteforce solution.
Problem description:
--------------------------------------------------------------------------------------
By starting at the top of the triangle below and moving to adjacent numbers on the row below, the maximum total from top to bottom is 23.
3
7 5
2 4 6
8 5 9 3

That is, 3 + 7 + 4 + 9 = 23.
Find the maximum total from top to bottom of the triangle below:
75
95 64
17 47 82
18 35 87 10
20 04 82 47 65
19 01 23 75 03 34
88 02 77 73 07 63 67
99 65 04 28 06 16 70 92
41 41 26 56 83 40 80 70 33
41 48 72 33 47 32 37 16 94 29
53 71 44 65 25 43 91 52 97 51 14
70 11 33 28 77 73 17 78 39 68 17 57
91 71 52 38 17 14 91 43 58 50 27 29 48
63 66 04 68 89 53 67 30 73 16 69 87 40 31
04 62 98 27 23 09 70 98 73 93 38 53 60 04 23
----------------------------------------------------------------------------------------
So solution in C# is more or less short and fast (calculates answer in 1,1 ms).
Also this solution is universal and can calculate bigger triangles (lets say 100 rows triangles and more) with a reasonable time.

Have fun!!

Wednesday, April 9, 2008

Project Euler Problem 21

Ok, this is it- problem description:
Let d(n) be defined as the sum of proper divisors of n (numbers less than n which divide evenly into n).
If d(a) = b and d(b) = a, where a ≠ b, then a and b are an amicable pair and each of a and b are called amicable numbers.

For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore d(220) = 284. The proper divisors of 284 are 1, 2, 4, 71 and 142; so d(284) = 220.

Evaluate the sum of all the amicable numbers under 10000.

Fun part,- C# solution.

So solution is fast enough,- runs in about 20 ms. A little bit explanations of why this
solution is fast:

  • We need not to search all divisors, but rather until SQRT limit of number.
    This is because if number, lets say x, have divisor d1 above SQRT(x), then,
    that number x MUST have also a divisor d2 = x/d1, below SQRT(x) limit.
  • We need not to verify are all test numbers amicable, because if we find one
    amicable number, lets say x, then we know that there exists second amicable
    number y, which is sum of proper divisors of x. This rule is true, because
    we acctually searching amicable number pairs. So by applying this rule
    we can use memorization - mark second amicable number in bool array,
    and skip that number from test later.

Have fun!