Sunday, June 21, 2009

Langton's ant Or order out of chaos

This time we will talk about Langton's ant - the simple system of simple rules, but with complex behavior. Simple system of Langton`s ant is defined as :
* At a white square, turn 90° left, flip the color of the square, move forward one unit
* At a black square, turn 90° right, flip the color of the square, move forward one unit
This is so called L/R system. But what if for the one color we will have not the one direction, but the direction chain ? That is- for the white squares we will have some chain of 'LRLRR...' events and for the black squares we will have another chain of 'RRLRRL...' events ?? These events should cycle in a row. Then more interesting patterns should occure. So this experiment tests this idea of multiple direction row for the same state/color. Experiment script is here.
Results are somewhat interesting. At first, the classic Langton`s ant example-

Can we have system with shorter steps to "highway" ? Seems - yes we can. This is it:

But this is not the end. There is system which is almost "highway" from the scratch:


And finally- some exotic general Langton`s ant systems:







Conclusion
Seems that systems L/R and L/LRL is unstable and switches to stable system L/RLL.
Another conclusion is that there are some other systems that gets to the different types of "highway" - the order out of chaos.

Have fun with Langton`s ant !!

Sunday, March 15, 2009

Detecting copy-move forgery in images

This time I`ll talk about digital image forensic and how to detect copy-move forgery in images. I`ve implemented some ad-hoc algorithm for detection of this kind of forgeries. Algorithm is robust, so it can detect forgeries in lossy image formats- such as JPEG. Please keep in mind - this algorithm is experimental toy, if you want more general solutions - you should read this paper or maybe this to get some ideas in the field.

Robust detection algorithm steps
1. Blur image for eliminating image details
2. Convert image to degraded palette
3. Decompose image into small NxN pixel blocks
4. Alphabetically order these blocks by their pixel values
5. Extract only these adjacent blocks which have small absolute color difference
6. Cluster these blocks into clusters by intersection area among blocks
7. Extract only these clusters which are bigger than block size
8. Extract only these clusters which have similar cluster, by using some sort of similarity function (in this case Hausdorff distance between clusters)
9. Draw discovered similar clusters on image

Copy-move detection script usage instructions
1. At first try to execute script with default parameters-
python detect_copymove.py image
2. Then try to lower block color deviation threshold-
python detect_copymove.py image --blcoldev=0.05
3. Finally - run script in manual mode and try to spot similar regions by eyes-
python detect_copymove.py image --blcoldev=0.05 --imauto=0
If by trying all 3 steps - no copy-move tamperings revealed - there is a good chance that really there are no such tamperings in image. BTW, script has more parameters, full list of them -
python detect_copymove.py --help

Experiments - object cloning forgery
This type of forgery tries to deceive viewer that there were more objects than really was.
Original picture:

Doctored image:

Copy-move script output:

As you see script detected cloning of one tank, but did not detected cloning of tank in further row. This can be because of small size of that tank and because that tank is on boundary of image.

Experiments - object hiding forgery
In this type of forgery viewer is deceived that there were less objects than in reality was.
Original image:

Doctored image:

Copy-move script output:


Experiments - mixed type forgery
In this type of forgery some objects was hidden, some - cloned.
Original picture:

Doctored image:

Copy-move script output:

As you see script detected 2 forgeries - cloning of dog, and cloning of background (which was done with intention to hide other dog)

Conclusions
Copy-move forgery detection script works not bad. However it is also not perfect of course (may not detect all tamperings, so called false negative rate is not zero of course). And I`ve not tested very well, but I suppose script also have false positive rate - i.e. can detect areas which actually was not tampered. But all in all - I think this algorithm is pretty obvious and can be simply implemented and used as first line of defense against copy-move forgeries.

Have fun in copy-move forgeries hunting !!

Saturday, February 21, 2009

Evolving first lady of the internet

At last I ported selfish gene algorithm into Python. This algorithm pseudo-code is:
1. Encode some solution into set of genes.
2. Set default probability for each gene.
3. Generate 2 random solutions, according to genes probability.
4. Compare these 2 solutions:
for better solution- increase it`s genes probability by some value
for worse solution- decrease it`s genes probability by some value
5. Repeat everything from 3, until needed.

Now, interesting part. What can we do with selfish gene algorithm ? After seeing this post about genetic programming and evolutionary art, I`ve decided to try something similar. But only by using selfish gene algorithm. I`ve tried 2 experiments - tried to evolve picture of first lady of the internet composed of polygons. And second experiment - lenna picture is evolved as some number of lines.
So experiment idea is to generate 2 random images, composed of random polygons (or lines in other experiment) and to compare these 2 images. For image which is more similar to original lenna picture - we increase polygons probability, for other picture - decrease polygons probability. In the long run - "good polygons" tends to group together.
Below are the results of these experiments. Evolved pictures of lenna are compiled as frames of animated GIF image. N - is the iteration number (starting from zero):

Lena evolved as set of polygons

Original LenaLena evolution:
39614 iterations
100 polygons
3.5 hours
experiment code


Lena evolved as set of lines
Original LenaLena evolution:
69471 iterations
200 lines
2.5 hours
experiment code



Conclusions:
Selfish gene algorithm (sort of evolution strategy algorithm) is suitable for solving search and optimization problems, including generation of evolutionary art :-)
Below are final iteration pictures better quality than gif:


Have fun with selfish gene algorithm, genetic algorithms and evolution art !

Wednesday, December 3, 2008

Project Euler p83 and Dijkstra's algorithm

Problem definition:
-----------------------------------------------------------
In the 5 by 5 matrix below, the minimal path sum from the top left to the bottom right, by moving left, right, up, and down, is indicated in bold and is equal to 2297.

131 673 234 103 18
201 96  342 965 150
630 803 746 422 111
537 699 497 121 956
805 732 524 37  331

Find the minimal path sum, in matrix.txt , a 31K text file containing a 80 by 80 matrix, from the top left to the bottom right by moving left, right, up, and down.

-----------------------------------------------------------
Problem is solved with the help of Dijkstra's algorithm
Answer is calculated in a 2.5 seconds. Also it is not optimized,- computes distances to all nodes in network. To speed up - when we find target node - we can only check these nodes which distance is less than target node distance. Also this code suits equally well for solving project euler problem no 81 - only change shortestpath() function parameter "directions". And finally - this solution shows shortest path visually,- prints it to screen. Program code:
___________________________________________________________
import time as t

def readmatrix(filename):
f=open(filename, 'r')
m=[map(lambda x: int(x),r.strip().split(",")) for r in f.readlines()]
return m

def nextcell(dctresults):
minw = 10000000
mink = None
for k in dctresults:
if not dctresults[k][2]:
if dctresults[k][1] < minw:
minw = dctresults[k][1]
mink = k
return mink

def analyzecell(lstmat, tupcell, lstdirec, dctresults):
i,j = tupcell
for dr in lstdirec:
di,dj = dr
if (i+di) in range(len(lstmat)) and (j+dj) in range(len(lstmat)):
if not dctresults.has_key((i+di,j+dj)):
dctresults[(i+di,j+dj)] = [(i,j),dctresults[(i,j)][1]+lstmat[i+di][j+dj],False]
else:
oldw = dctresults[(i+di,j+dj)][1]
neww = dctresults[(i,j)][1]+lstmat[i+di][j+dj]
if neww < oldw:
dctresults[(i+di,j+dj)][0] = (i,j)
dctresults[(i+di,j+dj)][1] = neww
dctresults[(i,j)][2] = True

def shortestpath(filename, directions):
mat = readmatrix(filename)
results = {}
results[(0,0)] = [None,mat[0][0],False]
cell = nextcell(results)
while cell:
analyzecell(mat,cell,directions,results)
cell = nextcell(results)
return results, results[(len(mat)-1,len(mat)-1)][1], mat

def showpath(res, inpmat):
print ' '+'-'*len(inpmat)+' '
pathcells = []
curc = (len(inpmat)-1,len(inpmat)-1)
while curc:
pathcells += [curc]
curc = res[curc][0]
for i in range(len(inpmat)):
s='|'
for j in range(len(inpmat)):
s+= '#' if (i,j) in pathcells else '.'
print s+'|'
print ' '+'-'*len(inpmat)+' '

t1 = t.time()*1000
resd, pathsum, mat = shortestpath('matrix.txt',[(-1,0),(+1,0),(0,-1),(0,+1)])
t2 = t.time()*1000
print 'Shortest path sum', pathsum, '; Calculation time is ', int(t2-t1), 'ms'
showpath(resd, mat)

___________________________________________________________
Have fun with shortest path algorithms !!!

Tuesday, November 11, 2008

Project Euler p92

Problem description:
-------------------------------------------------------------------
A number chain is created by continuously adding the square of the digits in a number to form a new number until it has been seen before.

For example,

44 -> 32 -> 13 -> 10 -> 1 -> 1
85 -> 89 -> 145 -> 42 -> 20 -> 4 -> 16 -> 37 -> 58 -> 89

Therefore any chain that arrives at 1 or 89 will become stuck in an endless loop. What is most amazing is that EVERY starting number will eventually arrive at 1 or 89.

How many starting numbers below ten million will arrive at 89?
-------------------------------------------------------------------
Semi-bruteforce algorithm based on idea that we don`t need to construct full number chains from starting number until 1 or 89. Instead we will save in memory each starting number final result (1 or 89). And at the next starting number we will construct number chain only until previous calculated number.
Well solution is slow, but fits in "one minute rule" :-). Here it is-
_____________________________________________________________
import time as t

def nextnum(number):
n = number
s = 0
while n!=0:
m = n%10
n = n/10
s += m*m
return s

def untilknown(number):
start = number
res = []
while start >= number and start not in (1,89):
start = nextnum(start)
res += [start]
return res

def endswith89(limit):
seq = {1:1,89:89}
ret = 0
for x in range(1,limit):
if not seq.has_key(x):
k = untilknown(x)
if k:
kk = k[-1]
else:
kk = x
seq[x] = seq[kk]
if len(k) > 1:
for i in range(len(k)-1):
seq[k[i]] = seq[kk]
if seq[x] == 89:
ret += 1
return ret

t1 = t.time()
ans = endswith89(10000000)
t2 = t.time()

print ans, int(t2-t1),'s'

_____________________________________________________________

Have fun !!

Monday, October 27, 2008

Project Euler p99

Project Euler problem 99 description:
------------------------------------------------------------------------
Comparing two numbers written in index form like 2^11 and 3^7 is not difficult, as any calculator would confirm that 2^11 = 2048 < 3^7 = 2187.

However, confirming that 632382^518061 > 519432^525806 would be much more difficult, as both numbers contain over three million digits.

Using base_exp.txt (right click and 'Save Link/Target As...'), a 22K text file containing one thousand lines with a base/exponent pair on each line, determine which line number has the greatest numerical value.

NOTE: The first two lines in the file represent the numbers in the example given above.
------------------------------------------------------------------------
The trick of solution is not to compare whole numbers (because raising to power will be too much CPU time and memory consuming operation), instead we can compare logarithms of numbers. So actually we will compare LOG(number^power). Ok, but we still have to do power operation ? Well, no more ! By applying known logarithm formula: LOG(number^power) = power*LOG(number). So we are comparing power*LOG(number). This is very fast operation.
Problem solution is found in a 3.7 ms. Solution Python code:
_____________________________________________________
import time as tm
import math as m

t1 = tm.time()*1000
f=open('base_exp.txt', 'r')
l=[i.strip().split(",")+[n+1] for (n,i) in enumerate(f.readlines())]
l = map(lambda i: (int(i[0]),int(i[1]), i[2]), l)
l = map(lambda (x,a,n): (a*m.log(x), n), l)
max = max(l)
t2 = tm.time()*1000

print max,t2-t1,'ms'

________________________________________________________
Have fun!

Saturday, October 11, 2008

Calculation of Pi

This time I tried to answer a question - what is the best method of calculating Pi to a required accuracy - for example,- 100 or 1000 or 10 000 decimal digits ?
So I chose 4 Pi formulas for testing:
1. Leibniz
2. Madhava transformed series
3. Machin like formula
4. Gauss algorithm
The idea for experiment - is to find out which algorithm gives more decimal digits of Pi in same number of iterations/terms. So measurement algorithm is this:
1. Calculate correct digits of Pi given from every Pi formula in each iteration.
2. Plot diagram - correct Pi decimal digits amount VS iteration number
(For this to work - you must have Matplotlib module).
Python code doing this stuff:
_____________________________________________________
import inspect, time, sys
from decimal import Decimal, getcontext
from pylab import plot, show, xlabel, ylabel, title, grid, legend

def correct_digits(calcpi):
pi = "3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489549303819644288109756659334461284756482337867831652712019091456485669234603486104543266482133936072602491412737245870066063155881748815209209628292540917153643678925903600113305305488204665213841469519415116094330572703657595919530921861173819326117931051185480744623799627495673518857527248912279381830119491298336733624406566430860213949463952247371907021798609437027705392171762931767523846748184676694051320005681271452635608277857713427577896091736371787214684409012249534301465495853710507922796892589235420199561121290219608640344181598136297747713099605187072113499999983729780499510597317328160963185950244594553469083026425223082533446850352619311881710100031378387528865875332083814206171776691473035982534904287554687311595628638823537875937519577818577805321712268066130019278766111959092164201989"
gd = 1000
for i,s in enumerate(zip(pi,str(calcpi))):
if s[0] != s[1]:
gd = i - 2
if gd < 0: gd = 0
break
return gd

def show_results(res):
x=range(len(res[0][2]))
plpar = ""
legpar = ""
for i in range(len(res)):
if plpar: plpar +=","
if legpar: legpar +=","
plpar += "x, res["+str(i)+"][2]"
legpar += "res["+str(i)+"][0][3:] +' ('+ str(int(res["+str(i)+"][1])) + 'ms)'"
plpar = "plot("+plpar+", linewidth=2.0)"
legpar = "legend(("+legpar+"), loc=0)"
exec(plpar)
exec(legpar)
xlabel('Iterations', fontsize = 15)
ylabel('Correct digits', fontsize = 15)
title("Convergence of some $\pi$ formulas",fontsize=20)
grid()
show()

def test_pi_formulas(pi_func_list):
res = []
terms = 6
print 'Calculating %d iterations...' % terms
getcontext().prec = 1002
for pif in pi_func_list:
t1 = time.time()*1000
name,r = pif(terms)
t2 = time.time()*1000
res.append((name,t2-t1,r))
show_results(res)

def pi_Leibniz(terms):
"""
http://en.wikipedia.org/wiki/Leibniz_formula_for_pi
"""
st = inspect.stack()[0][3]
ra = Decimal(0)
ans = []

for n in range(terms):
ra += Decimal(4*(-1)**n)/Decimal(2*n+1)
ans.append(correct_digits(ra))

return st,ans

def pi_Madhava(terms):
"""
http://en.wikipedia.org/wiki/Madhava_of_Sangamagrama#The_value_of_.CF.80_.28pi.29
"""
st = inspect.stack()[0][3]
ra = Decimal(0)
k = Decimal(12)**Decimal('0.5')
ans = []

for n in range(terms):
ra += Decimal((-1)**n)/Decimal((2*n+1)*(3**n))
ans.append(correct_digits(k*ra))

return st, ans

def pi_Machin(terms):
"""
http://en.wikipedia.org/wiki/Machin-like_formula
"""
st = inspect.stack()[0][3]
c1 = Decimal(1)/Decimal(5)
c2 = Decimal(1)/Decimal(239)
a = Decimal(4)
atan1 = Decimal(0)
atan2 = Decimal(0)
ra = Decimal(0)
ans = []

for n in range(terms):
atan1 += Decimal((-1)**n) * (c1**Decimal(2*n+1))/Decimal(2*n+1)
atan2 += Decimal((-1)**n) * (c2**Decimal(2*n+1))/Decimal(2*n+1)
ra = a * (a*atan1 - atan2)
ans.append(correct_digits(ra))

return st, ans

def pi_Gauss(terms):
"""
http://en.wikipedia.org/wiki/Gauss%E2%80%93Legendre_algorithm
"""
st = inspect.stack()[0][3]
one = Decimal(1)
ra = Decimal(0)
val = (one,one/(Decimal(2)**Decimal('0.5')),one/Decimal(4),one)
ans = []

for n in range(terms):
an, bn, tn, pn = val
an1 = (an+bn)/Decimal(2)
bn1 = (an*bn)**Decimal('0.5')
tn1 = tn - pn * (an - an1)**Decimal(2)
pn1 = Decimal(2) * pn
val = (an1, bn1, tn1, pn1)
ra = ((an+bn)**Decimal(2))/(Decimal(4)*tn)
ans.append(correct_digits(ra))

return st, ans

if __name__ == "__main__":
mod = sys.modules[__name__]
pifunc = []
for key, value in mod.__dict__.items():
if str(value)[:13] == '<function pi_':
pifunc.append(value)
test_pi_formulas(pifunc)

_____________________________________________________
BTW this code is very scalable - if there is a need to test additional Pi formula - you only need to add new Pi function code like this:
"def pi_new_required_function(terms):
[...]". (Function name MUST begin with "pi_"). The rest program will do it for you - measure execution times, measure decimal digits correctness, plot results...
This is thanks to Python reflection used in program -> program scans itself for ALL pi functions and dynamically invokes them.

So after execution of this test we get such graph:


Conclusion
From graph can be seen that fastest converges Gauss Pi algorithm (from the chosen set of Pi formulas). So if you need for some reason to calculate Pi in desired accuracy (desired decimal places amount) I suggest Gauss algorithm. Because you will get result in less iterations / CPU cycles compared with other algorithms from the set.

Have fun!