On Evolution: Biological and Computational
The genetic algorithm is one method of solving problems in computer science. Although it comes in many flavors, for the purposes of this article I’ll focus on a simple example.
Imagine you have a multiple choice test with 20 questions. Each question can be answered by choosing one of four choices. A compete answer to the test consists of a string of 20 characters, each of which is chosen from the set {A, B, C, D}. The solution to the test is an answer string where every answer is correct.
A valid, but not necessarily correct, answer string would look like this:
ABDCBDABDBCABDCCBABD
How many possible answer strings are there? Well, each position in the string can take on 4 different values, and there are 20 of them, so there are 4^20 = 1,099,511,627,776 possible ways to answer the test. That’s “just” over 1 trillion (over by 99 billion, that is). To give you a sense of perspective, that’s about as many seconds as there are in 35,000 years. Yeah, that’s a lot.
Imagine you have access to an oracle to whom you can submit a solution string. He will look at your string, and tell you how many answers you got right, but he wont tell you which ones are right, just the raw score. Based on this information, how many answer strings do you think you would have to look at to find the right answer?
If you are just pulling random strings out of thin air, its entirely possible to choose the right one instantly. But that’s not likely to happen. When choosing a string at random, each string has a 1 in 1,099,511,627,776 chance of being chosen. If there is only one correct solution, you have a 1 in 1,099,511,627,776 chance of choosing the correct string on the first go, a 1 in 1,099,511,627,775 chance on the second (since you’ve already eliminated one string), and so on. You would have to examine billions of strings before your chance of choosing the correct string at random starts to even approach the chance that you will win the lottery. You would have to examine 99% of of the strings before the chance that a string picked at random from the remaining set is the correct one approach “normal” odds of 1 in a million.
“What does this have to do with computation?” you may ask. If, instead of choosing strings at random, we used a genetic algorithm, we can drastically reduce the number of strings that have to be checked to find the correct solution.
Feel free to skip this section if you don’t want the gory details.
Here is how I created a genetic algorithm to find the solution to a given test. I created an initial random population of 20,000 random strings. And by “random”, I mean just that: each string was chosen by randomly picking which letter would go at which spot in the string. Each of these strings was checked against the correct solution string for the number of answers it got right. This number became the fitness value of the string. The 1,000 most highly fit strings were carried over into the next generation unchanged. The remaining spots in the population were filled via two methods: mutation, and crossover.
For any one mutation, there are two possibilities. A string is selected from the population, on the basis of its fitness, i.e. more fit strings have a higher chance of being chosen than less fit strings, but every string has a chance to be chosen. The first possibility is that a random point in the string is chosen, and the letter at that point is randomly changed to one of the four possibilities. This gives a 1 in 4 chance that the mutation wouldn’t change anything at all. The second possibility is that a range in the string is selected and reversed. This also has the possibility of leaving the string intact, if the randomly selected range is only one character long. Either way, the resulting string was added to the new population.
For crossover, two strings are chosen from the population, and again the chance of being selected is weighted by fitness. A random point in the strings is chosen at which to preform crossover. The two strings are split at these points, and two new strings generated by combining the ends of the parts.
The fitness of the new population of strings is evaluated, and the process repeated until a correct solution was found.
Here are the results of the experiment.
What is the smallest number of generations this method took to find the correct string? When I ran the experiment, the shortest time was 12 generations. That means examining 12,000 strings, out of a possible 1,099,511,627,776, or, just 0.0000000001% (1.09139364 x 10^-9 to be exact) of the total possible strings. Obviously, this is more than slightly better than chance. The most generations it took was 22, a total of 22,000 strings. The average number of generations was 16. In the starting populations, the average fitness was generally 8, with 13 being the highest individual fitness I ever saw.
It is interesting to note that the operations performed on the strings are not guaranteed to produce a more fit string. Very often the change makes the resulting string less fit than its parents. But in general, the average fitness of the whole population goes up, even though most mutations are not beneficial. This is because when a beneficial mutation occurs, it tends to rapidly spread through the population. The new, more fit string has a better chance of mating than the other strings. This string’s offspring tend to inherit its higher fitness, so they have a better chance of reproducing than the offspring of other strings. In short order, all of the strings will be descended from the successful mutated string.
The take home point is that even without any knowledge of why a particular string fares better than another, the simple act of making random changes to fit strings gives rise to more fit strings, and converges on the most fit string in a surprisingly short time.
Now, this is not to say that evolution in the natural world works exactly as I have implemented it here. But it does serve to show that IF you have a means of varying the organisms in an environment and ALSO some way of making some of them survive and others die off that’s related to the traits possessed by individuals, THEN those traits which cause the individuals to survive will spread throughout the population.
Some people argue that evolution cannot give rise to new information, as culling members of an existing population serves to reduce genetic diversity. While it is true that culling members reduces genetic diversity, and by extension (in some sense) information, their claim that this rules out evolution is wrong. It’s the mutation that introduces new information into the gene pool. Natural selection serves to cull out information that is detrimental to survival.
Some people accept “micro” evolution, but not “macro” evolution. I’ve dealt with this in a previous post, so I’ll just point out here that, if two populations of the same species end up in different areas, where different traits are required for survival, then evolution dictates that they diverge into separate species. This is caused by two factors: the selection criteria for survival will probably be different in the different areas, and geographic separation means that any new traits introduced by mutation will not spread to the other population.
To summarize, in order for evolution to occur, the ONLY things needed are 1. variation in a population of individuals and 2. some individuals more fit to their environment than others. If these two properties are present, evolution WILL happen. Not just may, or might, but WILL. If you admit that species change over time, no matter how minor the changes, and that members of a species survive or die depending on individual traits, you must accept that evolution takes place. Furthermore, you must accept that speciation will happen when populations are split into different geographic areas with different environments. To do otherwise should result in your head exploding from the cognitive dissonance of holding two contradictory ideas at the same time.