Next: ACKNOWLEDGMENTS Up: Evolution of Biological Information Previous: RESULTS

DISCUSSION

The results, which show the successful simulation of binding site evolution, can be used to address both scientific and pedagogical issues. R_sequence approaches and remains around R_frequency(Fig. 2b), supporting the hypothesis that the information content at binding sites will evolve to be close to the information needed to locate those binding sites in the genome, as observed in natural systems [4,6]. That is, one can measure information in genetic systems, the amount observed can be predicted, and the amount measured evolves to the amount predicted. This is useful because when this prediction is not met [4,28,29,6] the anomaly implies the existence of new biological phenomena. Simulations to model such anomalies have not been attempted yet.

Variations of the program could be used to investigate how population size, genome length, number of sites, size of recognition regions, mutation rate, selective pressure, overlapping sites and other factors affect the evolution. Another use of the program may include understanding the sources and effects of skewed genomic composition [4,7,30,31]. However, this could be caused by mutation rates, and/or it could be the result of some kind(s) of evolutionary pressure that we don't understand, so how one implements the skew may well affect or bias the results.

The ev model quantitatively addresses the question of how life gains information, a valid issue recently raised by creationists [32] (Truman, R. (1999), http://www.trueorigin.org/dawkinfo.htm) but only qualitatively addressed by biologists [33]. The mathematical form of uncertainty and entropy ( $H = -\sum p \log_2 p$ , $\sum p = 1$ ) implies that neither can be negative ( $H \ge 0$ ), but a decrease in uncertainty or entropy can correspond to information gain, as measured here by R_sequenceand R_frequency. The ev model shows explicitly how this information gain comes about from mutation and selection, without any other external influence, thereby completely answering the creationists.

The ev model can also be used to succinctly address two other creationist arguments. First, the recognizer gene and its binding sites co-evolve, so they become dependent on each other and destructive mutations in either immediately lead to elimination of the organism. This situation fits Behe's [34] definition of `irreducible complexity' exactly (``a single system composed of several well-matched, interacting parts that contribute to the basic function, wherein the removal of any one of the parts causes the system to effectively cease functioning'', page 39), yet the molecular evolution of this `Roman arch' is straightforward and rapid, in direct contradiction to his thesis. Second, the probability of finding 16 sites averaging 4 bits each in random sequences is $2^{-4 \times 16} \cong 5 \times 10^{-20}$ yet the sites evolved from random sequences in only $\sim$ 10³ generations, at an average rate of $\sim$ 1 bit per 11 generations. Because the mutation rate of HIV is only 10 times slower, it could evolve a 4 bit site in 100 generations, about 9 months [35], but it could be much faster because the enormous titer (10¹⁰ new virions/day/person [17]) provides a larger pool for successful changes. Likewise, at this rate, roughly an entire human genome of $\sim 4 \times 10^9$ bits (assuming an average of 1 bit/base, which is clearly an overestimate) could evolve in a billion years, even without the advantages of large environmentally diverse worldwide populations, sexual recombination and interspecies genetic transfer. However, since this rate is unlikely to be maintained for eukaryotes, these factors are undoubtedly important in accounting for human evolution. So, contrary to probabilistic arguments by Spetner [36,32], the ev program also clearly demonstrates that biological information, measured in the strict Shannon sense, can rapidly appear in genetic control systems subjected to replication, mutation and selection [33].

Next: ACKNOWLEDGMENTS Up: Evolution of Biological Information Previous: RESULTS

Tom Schneider
2001-11-07