Velocity Reviews > very simple Genetic Algorithm completed

# very simple Genetic Algorithm completed

Matthew_WARREN@bnpparibas.com
Guest
Posts: n/a

 01-31-2008
Hi,

I got some help with this from here, and there's been a little bit of
discussion around GA's recently, so thought I'd post up my likey slow and
clunky version of a GA that in essence just 'evolves' a solution to 'make a
sum that evaluates to n using */+-0123456789' it's a really simple GA that
would be useful for someone who doesn't quite get GA's to look at.

I think it's simple enough to be fairly self explanatory.

to see it come up with evolved solutions to n=1000

>>>from quickga import *
>>>evolve()

I like playing with stuff like this. I'm going to use this little toy to
investigate how mutation rates/crossover gene length, pop size etc.. etc..
interact with each other. All completely unscientifically and for my own
bemusement.

One point, it's a good idea to keep mutationrate around 1000 - 10000 with
genome and population sizes of say 50 - 100. Too low and you get no
solution as the mutations mix up the genome too much for selection
pressure to work.

....as this actually does need to go as quick as it can, and if anyone feels
like it, I'd really appreciate picking it over on the list for
optimization. I'm not too familiar with Pthon internals, nor programming
for speed in general.

<pre>
from random import randint
from operator import itemgetter

genes=['+','-','*','/','0','1','2','3','4','5','6','7','8','9']
signs=['+','-','*','/']
digits=['1','2','3','4','5','6','7','8','9']

table = {"++": "+", "+-": "-", "+*": "+", "+/": "+",
"-+": "-", "--": "+", "-*": "-", "-/": "-",
"*+": "*", "**": "*", "*/": "*",
"/+": "/", "/*": "/", "//": "/",
"+0":"+","*0":"*","-0":"-","/0":"/"} # keep out octal literals

def rationalise_signs(s):
"""Takes the genome string and twiddles it so eval() will work as
expected
"""
prev = ''
while s != prev:
prev=s
for z in ['+','-','*','/']:
s=s.replace(z+'0',z)
for key, value in table.items():
s = s.replace(key, value)
s=s.lstrip('0')
s=s.strip('+-*/')
return s

def generate(number,length):
"""Generate the initial population of genome strings
"""
population=[]
for i in range(number):
s=rationalise_signs(''.join([
genes[randint(0,len(genes))-1] for n in range(length) ]))
population.append(s)
return population

def floatify(intstring):#So eval() be floating point.
"""kludge to ensure eval() does floating point math
"""
prev=''
while intstring != prev:
prev=intstring
for digit in digits:

intstring=intstring.replace(digit+sign,digit+'.0'+ sign)
return intstring

def express(population):
"""Get the 'expression' of the genome.
"""
expressed_population=[]
for individual in population:
s=floatify(individual)
expressed_population.append((individual,eval(s)))
return expressed_population

def fitness(expressed_population,fitvalue,tolerance):
"""Test the expressed genome for fitness
"""
population_fitness=[]
sumfitness=0
for expressed_individual in expressed_population:
individual,expression=expressed_individual
fitness=abs(fitvalue-expression)
sumfitness=sumfitness+fitness
population_fitness.append((individual,expression,f itness))
avgfitness=sumfitness/len(expressed_population)
return (population_fitness,avgfitness)

def get_fittest(population_fitness,pct,full=False):
"""Quick n dirty way of selecting - top n% fittest individuals
"""
population_fitness.sort(key=itemgetter(2))#sort on fitness
npct=(len(population_fitness)/100.0)*pct
if not full:
return [ n[0] for n in population_fitness[0:int(npct)] ]
else:
return population_fitness[0:int(npct)]

def mutate(individual,rate):
"""Does what it says on the tin. Mutates per gene
if rate is 10000 mutatuion rate is 1 in 10000 on avg
"""
newindividual=''
for gene in individual:
if randint(0,rate)==1:
newgene=genes[randint(0,14)-1]
newindividual=newindividual+newgene
else:
newindividual=newindividual+gene
return newindividual

def breed_new(individuals,number,mutationrate):#crosso ver with mutation
"""simple crossover of the two genomes around a point, then mutate
"""
newpopulation=[]
num_individuals=len(individuals)
while len(newpopulation)<=number:
man=individuals[randint(0,num_individuals-1)]
xpoint=randint(0,100)
xman=(len(man)/100.0)*xpoint
leftxman=man[:int(xman)]
rightxman=man[int(xman):]

newpopulation.append(new1)
newpopulation.append(new2)
return newpopulation

def
evolve(popsize=50,genomelength=100,mutationrate=10 000,fitcullpct=10,numsolutions=5,target=1000,toler ance=1):
"""Controls the whole process.
"""
pop=generate(popsize,genomelength)
fitgens=[]
generation=1
while len(fitgens)<numsolutions:
epop=express(pop)
fpop,avg=fitness(epop,target,tolerance)
print "Average fitness",avg
if avg>tolerance:

pop=breed_new(get_fittest(fpop,fitcullpct),popsize ,mutationrate)
generation=generation+1
else:
print "Pop avg fitness within tolerance"
print "********************************"
fitgens.append((fpop[0:],generation))
pop=generate(popsize,genomelength)
generation=1
outlist=[]
for fitpop,generation in fitgens:
bestfitpop=get_fittest(fitpop,20,full=True)
for fitgeneinfo in bestfitpop:
genome,number,avgfit=fitgeneinfo
prev=''
s=floatify(genome)
outlist.append(genome+" in "+str(generation)+"
generations got "+str(number)+" avg fit ="+str(avgfit))
for line in set(outlist):
print line

</pre>

Matt. (Apologies for any disclaimers)

This message and any attachments (the "message") is
intended solely for the addressees and is confidential.
immediately notify the sender. Any use not in accord with
its purpose, any dissemination or disclosure, either whole
or partial, is prohibited except formal approval. The internet
can not guarantee the integrity of this message.
BNP PARIBAS (and its subsidiaries) shall (will) not
therefore be liable for the message if modified.
Do not print this message unless it is necessary,
consider the environment.

---------------------------------------------

Ce message et toutes les pieces jointes (ci-apres le
"message") sont etablis a l'intention exclusive de ses
destinataires et sont confidentiels. Si vous recevez ce
message par erreur, merci de le detruire et d'en avertir
immediatement l'expediteur. Toute utilisation de ce
message non conforme a sa destination, toute diffusion
ou toute publication, totale ou partielle, est interdite, sauf
autorisation expresse. L'internet ne permettant pas
d'assurer l'integrite de ce message, BNP PARIBAS (et ses
filiales) decline(nt) toute responsabilite au titre de ce
message, dans l'hypothese ou il aurait ete modifie.
N'imprimez ce message que si necessaire,
pensez a l'environnement.

Paul McGuire
Guest
Posts: n/a

 02-01-2008
On Jan 31, 10:43*am, (E-Mail Removed) wrote:
> Hi,
>
> I got some help with this from here, and there's been a little bit of
> discussion around GA's recently, so thought I'd post up my likey slow and
> clunky version of a GA that in essence just 'evolves' a solution to 'make a
> sum that evaluates to n using */+-0123456789' *it's a really simple GA that
> would be useful for someone who doesn't quite get GA's to look at.
>
> I think it's simple enough to be fairly self explanatory.
>
> to see it come up with evolved solutions to n=1000
>
> >>>from quickga import *
> >>>evolve()

>

Some improvement tips:

0. Tack this bit onto the end of quickga.py, and you wont have to
write a separate routine to import quickga and invoke evolve():

if __name__ == "__main__":
evolve()

1. Remove all calls to floatify, and instead start your program with:
from __future__ import division
On my system this gained about 16% improvement.

2. Bugfix: In 2 places, change:
newgene=genes[randint(0,14)-1]
to
newgene=genes[randint(0,14-1)]

randint(a,b) returns values from a to b inclusive, and genes is a list
containing 14 elements (so you want subscripts from 0 to 13). (Using
subscripts from -1 to 13 will bias the selection of genes to use twice
as many of the last item in the list, since both -1 and 13 will give
that value.)

Similarly, change:

s=rationalise_signs(''.join([ genes[randint(0,len(genes))-1] ...
to:
s=rationalise_signs(''.join([ genes[randint(0,len(genes)-1)] ...

3. Change mutate to build a list instead of a string. Then use
''.join() at the end to convert the list into a single string return
value.

def mutate(individual,rate):
"""Does what it says on the tin. Mutates per gene
if rate is 10000 mutatuion rate is 1 in 10000 on avg
"""
newindividual=[]
for gene in individual:
if randint(0,rate)==1:
newgene=genes[randint(0,14-1)]
newindividual.append(newgene)
else:
newindividual.append(gene)
return ''.join(newindividual)

(This was less of a speed improvement than I thought it would be, but
IIRC, the optimization of successive string concatentions is only
available when running Python on Windows. If you are running on Linux,
this should have more benefit.)

4. Include psyco to cut execution time in half. Put this at the top
of your program, right after "from __future__ ...":

try:
import psyco
except ImportError:
print ("Running without psyco optimization")
else:
psyco.full()

If psyco is available, this will optimize all called functions.

5. On a hunch that many of your individuals are the same string, I
lifted the call to eval out of express() into a separate routine
called evaluate(), and added a memoizing cache to skip the call to
eval if the string had been eval'ed before - if so, the value is
reported from the cache. This chopped another 40% off the runtime.

evalcache = {}
def evaluate(s):
if s in evalcache:
ret = evalcache[s]
else:
ret = eval(s)
evalcache[s] = ret
return ret

(A note on timing this code: since there are many calls to
randomization methods, consistent timing requires an explicit call to
random.seed() to ensure that a consistent set of random numbers is
used. Otherwise, the timing gets thrown off by the randomization,
which sends the program down different paths.)

6. This is another change that had little effect, but I'll at least
put it out there (a leftover from an algorithmic optimization course I

fitness=abs(fitvalue-expression)

try using:

fitness=(fitvalue-expression)*(fitvalue-expression)

For some optimization methods (GA is not one of them), the
discontinuity of abs at the origin delays convergence of the
algorithm, whereas computing the square of the difference *is*
continuous at 0 *and* still ensures a positive value. Just gave it a
try out of academic curiosity, but just a dead end after all.

Cheers,
-- Paul

Paul McGuire
Guest
Posts: n/a

 02-01-2008
On Jan 31, 10:43*am, (E-Mail Removed) wrote:
> Hi,
>
> I got some help with this from here, and there's been a little bit of
> discussion around GA's recently, so thought I'd post up my likey slow and
> clunky version of a GA that in essence just 'evolves' a solution to 'make a
> sum that evaluates to n using */+-0123456789' *it's a really simple GA that
> would be useful for someone who doesn't quite get GA's to look at.
>
> I think it's simple enough to be fairly self explanatory.
>
> to see it come up with evolved solutions to n=1000
>
> >>>from quickga import *
> >>>evolve()

>
> I like playing with stuff like this. I'm going to use this little toy to
> investigate how mutation rates/crossover gene length, pop size etc.. etc..
> interact with each other. All completely unscientifically and for my own
> bemusement.

Another interesting technique, similar to GA, is SA or Simulated
Annealing. You should be able to adapt your quickga.py program to an
SA approach without too much trouble, and comparing the two should

-- Paul

Carl Banks
Guest
Posts: n/a

 02-01-2008
On Feb 1, 9:09 am, Paul McGuire <(E-Mail Removed)> wrote:
> 2. Bugfix: In 2 places, change:
> newgene=genes[randint(0,14)-1]
> to
> newgene=genes[randint(0,14-1)]
>
> randint(a,b) returns values from a to b inclusive, and genes is a list
> containing 14 elements (so you want subscripts from 0 to 13). (Using
> subscripts from -1 to 13 will bias the selection of genes to use twice
> as many of the last item in the list, since both -1 and 13 will give
> that value.)

Better yet, use random.randrange: it was added for this very reason,
to get a random index in a range.

Perhaps even better still, use random.choice. It gets a random
element in a sequence.

Carl Banks

Steven D'Aprano
Guest
Posts: n/a

 02-01-2008
On Fri, 01 Feb 2008 06:09:49 -0800, Paul McGuire wrote:

> IIRC, the optimization of successive string concatentions is only
> available when running Python on Windows. If you are running on Linux,
> this should have more benefit.)

There's no reason to believe it is platform-dependent, although it is
implementation-dependent (only works in CPython). It was introduced in
Python 2.4:

http://www.python.org/doc/2.4/whatsnew/node12.html

--
Steven