Simple Genetic Algorithm in Python

Question

For past few months I was trying to understand genetic algorithms (GA) and most of the materials availble in the web was not always easy for me. Then I came across this article written by Ahmed Gad Genetic Algorithm Implementation in Python which implemented GA with numpy. Using this as a guiding tool I wrote my first GA in python with numpy. All of the codes were written by me except cal_pop_fitness.

The problem GA need to solve was to find parameters (a,b) in an equation of the format y = a*x1+b*x2 where x1,x2 and y are give as a numpy array. The equation I chose to solve is y = 2*x1+3*x2. Because we have two parameters to solve I chose two genes per chromosome. In all GA's we have to choose a fitness function and I chose mean squared error (MSE) as the fitness function for selecting best parents. MSE was chosen because we already have the real output y. The lesser the MSE better the parents we selected. This also the main difference between mine and Ahmed's GA where he used a maximisation fitness function I used a minimisation function. From the selected parents we generate offsprings by crossover and mutations. All offsprings go through crossovers but only a few offsprings have mutations.

import numpy as np

np.set_printoptions(formatter={'float': '{: 0.3f}'.format})
np.random.seed(1)

def generate_data(x_range):
    # Formula='2*x1+3*x2'
    x_range = range(-x_range,x_range)
    x = np.vstack((np.array(x_range),np.array(x_range)*2)).T
    y = [2*i[0]+3*i[1] for i in x]
    return x,np.array(y)

def cal_pop_fitness(equation_inputs, pop):
     fitness = np.sum(pop*equation_inputs, axis=1)
     return fitness

def select_best_parents(total_population,top_parents):
    arr = []
    for i in total_population:
        pred_y = cal_pop_fitness(i,X)
        mse = (np.square(y-pred_y)).mean(axis=None) # Mean squared error
        # Append the mse with chromose values
        row = np.append(i,mse).tolist()
        arr.append(row)
    arr = np.array(arr)
    # Sorting the chromosomes respect to mse
    # Lower the mse better the individuals
    arr = arr[arr[:,2].argsort()]
    error = arr[:,2]
    arr = arr[:,0:2] # removing mse column
    return arr[0:top_parents,:],error[0:top_parents]


def crossover(sub_population):
    children = []
    for i in range(population_size):
        # Selecting random two parents
        parent1 = np.random.randint(0,sub_population.shape[0])
        parent2 = np.random.randint(0,sub_population.shape[0])
        # A child is created from parent1's first gene and parent2's second gene
        child = [sub_population[parent1][0],sub_population[parent2][1]]
        children.append(child)
    return np.array(children)

def mutation(population):
    for i,row in enumerate(population):
        if np.random.rand() < mutation_rate:
            # Random mutations 
            population[i][np.random.randint(genes)] = np.random.uniform(low = -4.0, high = 4.0)
    return population

if __name__ == "__main__":
    # Generate input ouptut data
    X,y = generate_data(150)

    genes = X.shape[1] # Total genes in a chromosome
    population_size = 50 # Number of populations
    top_parents = int(0.25*population_size) # Select top parents from the total populations for mating
    mutation_rate = 0.1 # Mutation rate
    generations = 1000 # number of generations

    # Step1 : Population
    # Total population
    total_population = np.random.uniform(low = -4.0, high = 4.0, size = (population_size,genes))


    for i in range(generations):
        # Step2 : Fitness calculation
        # Step3 : Mating pool
        # Step4 : Parents selection
        # Choosing best parents with their corresponding mse
        sub_population,mse = select_best_parents(total_population,top_parents)

        # Step5 : Mating
        # Crossover
        new_population = crossover(sub_population)
        # Mutation
        new_population = mutation(new_population)

        print("Best Parameters: ",np.round(sub_population[0],3),"Best MSE: ", round(mse[0],3))
        # Next population
        total_population = new_population


    # Real
    x_range=range(-10,10)
    x1 = np.array(x_range)
    x2 = np.array(x_range)*2
    formula='2*x1+3*x2'
    y1=eval(formula)

    # Predicted by Genetic algorithm
    a = 1.943
    b = 3.029
    formula = 'a*x1+b*x2'
    y2=eval(formula)


    print("\nMSE between  Real and predicted Forumulas : ",(np.square(y1-y2)).mean(axis=None))

What is it that you expect to get from the community? General feedback on the code? Style? Clarity? Performance? — AlexV
– AlexV, Commented Sep 2, 2019 at 11:25
Code review. eventhough I took the ideas from the mentioned article I wrote the code myself except for cal_pop_fitness. — Eka
– Eka, Commented Sep 2, 2019 at 11:28
My question was more on which aspects of the code should be in the focus of the review. If you don't care which aspects get attention it's okay to say so. — AlexV
– AlexV, Commented Sep 2, 2019 at 11:34
Actuall the implementation of GA itself. Especially the creation of offspring using crossover and mutation. Whether my implementation is correct or not?. The second reason for posting the code here is to share my implementation for the newbie's in GA. — Eka
– Eka, Commented Sep 2, 2019 at 11:37

Diane M · Accepted Answer · 2019-09-03 06:39:18Z

From my understanding of the subject your implementation seem correct but there are numerous things you could fix or improve there.

Specific remarks

Use actually predicted values for printing i.e.

    # Predicted by Genetic algorithm
    a, b = sub_population[0]

Your initialisation of X is dangerous and prevents the algorithm to converge on some values since X[0] = 2*X[1], while they are supposed to be independant variables. Use random values instead.
Don't use X and y from global context in select_best_parents you could have ugly surprises if these are used elsewhere in the program. Add them to function parameters.
Your mutation function uses magic numbers, you could make [-4, 4] interval a parameter of your GA to make it more generic.
You could make a function of the main's for to segregate well what is part of algorithm and what is parameters.
You could extract your sorting key (MSE) as a separate function so it's easier to replace.

General remarks

If you want to validate your approach you need more than a single test value. So for example after you make a and b (3 and 2) parameters, you can use a loop or a variety of examples to prove it converges toward these values.
I find odd and damaging scientific programmers often pass on object oriented design. Using classes could be handy to pass the variables from core function to core function using object context rather than extending parameters of every function as needed, so it would to propose a catalog of different implementations of GA mutation, for instance. Consider migrating to an object oriented design if these are later goals
It could be beneficial to separate the usage example from the core GA functions in two distinct modules and files. They are independent parts that can be improved and extended separatedly.
I believe you can support more than two dimensions using a bit of tidying and genericizing in generate_data, crossover, and select_best_parents.
Improve style and formatting a bit, trying to follow PEP8. Keyword don't need spaces, two spaces between function definitions, use docstrings rather than comments etc.

Stack Exchange Network

Simple Genetic Algorithm in Python

1 Answer 1

Specific remarks

General remarks

You must log in to answer this question.

Hot Network Questions

Simple Genetic Algorithm in Python

1 Answer 1

Specific remarks

General remarks

You must log in to answer this question.

Related

Hot Network Questions