First, and foremost, you'll have to determine what you're trying to solve with your data set in the first place. You generally use a genetic algorithm to tackle non-deterministic problems: problems that take a long time to solve, but whose answers are easily verifiable.
So the first question is: what does your dataset represent?
The second question: what are you trying to solve and is a genetic algorithm a fitting method to solve your problem?
Anyway, creating a genetic algorithm is done through the following steps:
- Represent the problem variable domain as a chromosome of a fixed length, choose the size of the population N, the crossover probability
p(c) and the mutation probability p(m)
- Define a fitness function f(x) to measure the performance, or fitness, of an individual chromosome in the problem domain. The fitness function establishes the basis for selecting chromosomes that will be mated during reproduction
- Randomly generate an initial population of chromosomes of size N: x1, x2, ..., xn
- Calculate the fitness of each individual chromosome: f(x1), f(x2), ..., f(xn)
- Select a pair of chromosomes for mating from the current population. Parent chromosomes are selected with a probability related to their fitness. Highly fit chromosomes have a higher probability of being selected for mating than less fit chromosomes.
- Create a pair of offspring chromosomes by applying the genetic operators - crossover and mutation
- Place the created offspring chromosomes in the new population
- Repeat step 5 until the size of the new chromosome population become equal to the size of the initial population N
- Replace the initial (parent) chromosome population with the new (offspring) population
- Go to step 4 and repeat the process until the termination criterion is satisfied.
So, you have to find a notation for your solution (such as an array of bits or a string) that allows you to swap parts of chromosomes easily. Then you have to identify the crossover and mutation operations.
If you're dealing with ordered chromosomes, then depending on the applied crossover strategy you may have to repair your chromosomes afterwards. An ordered chromosome is a chromosome where the order or the genes matter. If you preform a standard crossover on two solution that represent the cities that the travelling salesman has to visit, you might end up with a chromosome where he visits some cities twice or more and some not at all!
There's no clear description on how to translate each problem in a genetic algorithm, because it's different for each problem. The above steps don't change, but you may need to introduce several different crossover and mutation operations to prevent premature convergence.