The Stoltzfus Chronicles
(Part I)

Summer, 1995

Topics [article number]:

Part I:

Split genes: [ar1]
Splicing: [ar2]
Duplicate genes: [ar3]
Sequence alignments: [ar3], [ar7]
Flanking accretions: [ar4]
Conguence: [ar5]
General problems with the new theory: [ar6], [ar15]
Phylogenetic trees: [ar6], [ar10], [ar12]
Micro/macro-evolution: [ar8], [ar9]
Fundamental concepts of evolution: [ar10], [ar12]
Reuse of genes in the pond: [ar13], [ar14], [ar15]

Split genes and ORFs: [ar16]
Splicing: [ar17]
Conguence: [ar22]
Phylogenetic trees: [ar22]
Fundamental concepts of evolution: [ar18], [ar20]
Unique characteristics of organisms: [ar18], [ar19], [ar20], [ar21]
Testing the new theory: [ar18], [ar20], [ar22]

The sequence alignments (data): [ar23]
The sequence alignments (explanation) [ar24]
Gaps in the alignments: [ar24], [ar25]
The phylogenetic tree: [ar26]
Discussion: [ar27], [ar28], [ar29], [ar30], [ar31]
Conguence: [ar30]
Epilogue and Dr. Senapathy's comments: [ar32]

Arlin Stoltzfus (arlin@ac.dal.ca)
Biochemistry Department
Dalhousie University
Halifax, Nova Scotia B3H 4H7
Canada

[ar1]

Arlin Stoltzfus: Why Senapathy seems to think that this particular debate [the probability of finding exons in random DNA] is the key to the origin of life is quite beyondme.

Jeffrey Mattox: It is key because it goes to the probability of finding genes in the primordial pond. Long genes (not the watch company) would be nearly impossible to find, but genes that are broken into pieces (exons/introns) would not only be easy to find, but are inevitable.

Arlin: Note the logical jump from "genes" to "Long genes." The "Independent Birth" theory is based on the same hidden assumption that long genes did not evolve from short genes, or from RNA, but instead arose full-length from random DNA sequences. That is, Senapathy's argument about the form in which genes first arose follows this path:

full-length genes arose spontaneously from random sequences; it is impossible for full-length genes to arise spontaneously in a an unsplit form; therefore genes must have arisen in split form. #3 follows from #1 and #2.

What Mattox was never told, and what Senapathy has apparently forgotten, is that proposition #1 in the above argument is an assumption, not an empirically based result. This assumption was not always hidden. Take a look at Senapathy's 1986 paper (PNAS 83, 2133), where he ponders "how protein-coding sequences could have evolved from primordial DNA sequences."

(quoting Senapathy): To answer this, two basic assumptions were made:

before a self-replicating cell could come into existence, DNA molecules were synthesized in the primordial soup by random addition of the 4 nucleotides without the help of templates and
the nucleotide sequences that code for proteins were selected from the pre-existing DNA sequences in the primordial soup by natural selection, not by construction from shorter coding sequences. [Senapathy, 1986: PNAS 83, 2133]

At the time this was written (and also today), two very common ideas are that:

genes first evolved in an RNA form and were converted later to DNA; and genes initially started out small and then grew larger.

Senapathy began his analysis by formally setting aside both of these possibilities. Of course, there is nothing wrong with considering some possibilities and ignoring others! No one has time to write (or read) a paper that thoroughly explores every formal possibility, and its not necessary, either! Senapathy focused on a particular set of possibilities, and (consistent with good scientific writing) identified his assumptions clearly. Unfortunately, these assumptions are now presented as though they were conclusions, but they are not.

JM: Have you read his book? If not, that is the way to understand his theory.

Arlin: You and Senapathy have said this repeatedly. If someone doesn't understand the theory, you instruct them to purchase and read a copy of the book, rather than provide an answer to their objection. Why not answer the objection, using the understanding you have gained? Perhaps you could help out the few people who have copies of the book by listing the relevant page numbers. For instance, perhaps you or Dr. Senapathy could give me the numbers of the pages describing the evidence that genes arose spontaneously, in full-length form (split or unsplit), rather than starting small and growing large over time (i.e., proposition #1 above)?

[ar2]

Arlin: Senapathy's argument about the form in which genes first arose follows this path:

full-length genes arose spontaneously from random sequences; it is impossible for full-length genes to arise spontaneously in a an unsplit form; therefore genes must have arisen in split form.

#3 follows from #1 and #2. What Mattox was never told, and what Senapathy has apparently forgotten, is that proposition #1 in the above argument is an assumption, not an empirically based result.

Jeff: Maybe it used to be an assumption in his past writings, but now it's part of his theory. The Senapathy theory is that complete genes composed of...

Arlin: It is still an assumption -- see below.

Jeff: Glad to. That would be all of Chapter 7 ("The Abundant Occurrence of Genes in The Primordial Pond"), and in particular, pages 230 - 250 ("The first genes were split genes and the first cells were eukaryote cells") and pages 250 - 254 for more on split/unsplit genes.

Arlin: No, he does not provide evidence in these pages for the assumption identified above, nor is this evidence in the literature that he cites (mainly his PNAS articles of 1986 and 1988). Instead, the assumption is reiterated: "introns are inevitable if genes occurred purely by chance in long random sequences in the primordial pond" (p. 232). Notice the "if." As I mentioned in my letter, it is also possible that genes started out as small RNA genes, then were converted to DNA and grew larger incrementally. That is, it is possible that genes did NOT arise full-length, spontaneously, from random sequences, but this possibility is ignored.

Again, here is the fallacy laid bare in the section you have identified:

"Since contiguously long prokaryotic genes were absolutely improbable to occur in the primordial sequences, the prokaryotic genes could not have directly come from the primordial genetic sequences; the contiguous genes of prokaryotes could only be derived from eukaryotic split genes losing introns." (p. 232)

Senapathy claims to have shown that spontaneous origin of unsplit genes of modern size (average 300-400 codons) from random sequences is impossible, but spontaneous origin of split genes of modern size from random sequences is possible. He concludes that genes must have arisen in split form, but this conclusion does not follow, because it is not necessary to assume that genes arose fully formed, full-length, from random sequences. Genes may have started small, then grown larger by duplications, fusions and flanking accretions. Senapathy merely ignores this possibility in his list of possibilities on p. 254, though this idea has existed for many, many years.

I understand Senapathy's splicing problem better than he does, and I can explain it to you so that you will understand it better also. Start with a random DNA sequence. Now we are going to transcribe it, and splice together some products, in an attempt to make long reading frames. Where will the splice sites be? Lets consider a splicing mechanism that recognizes GCCA (I just made it up). Starting with the first GCCA, we'll splice out everything from CA to the next occurrence of GC in GCCA, that is, we'll take ..GC|CA...GC|CA.. and splice out CA...GC, leaving ..GCCA... Then we'll find the next two occurrences. Voila! A spliced transcript has been produced. But what about the reading frames? If you have followed this example, you will realize that the resulting spliced sequence is every bit as random as the starting material, thus it contains lots of stop codons and mainly short reading frames.

How does Senapathy deal with this problem? Does his solution fit the facts?

Jeff: He deals with it head on. See pages 242 through 247 and figure 7.8.

Arlin: Look closely at part B of figure 7.8. The so-called "stop codon" (within the intron, just downstream of the exon) is not a stop codon at all, it is only a triplet with the same sequence as a stop codon. To function as a stop signal, a DNA sequence must be transcribed and then translated in the correct reading frame. Is this stop codon in the right frame to shorten the reading frame that starts in the exon to the left of it in Figure 7.8b? If not, is the presence of this stop codon really predicted by the Senapathy theory?

[ar3]

Jeff: He has not ignored duplications and fusions. See pages 123- 144 and...

Arlin: On p. 123-124 he merely dismisses gene duplication as an explanation for similar genes. Consider his references to duplicated gene families, including globins. As you probably know, we mammals tend to have a lot of globin genes, encoding slightly different globins. In our blood, there are combinations of alpha-globin and beta-globin that make hemoglobins. Some of the alpha or beta type genes are only expressed very early in development (fetal and embryonic globins). In our muscles, there are myoglobins.

Senapathy claims bluntly that invertebrates do not have homologs corresponding to vertebrate "duplicated gene" families. Yet, some of the first globins characterized were not vertebrate blood globins, but invertebrate lymph globins, such as that from the midge Chironomus thummi. Senapathy is apparently totally unaware that globins have been found in midges and many other invertebrate animals, such as clams and worms, as well as non-animals such as the green alga Chlamydomonas rheinhardtii and the yeast Saccharomyces cerevisiae, and non- eukaryotes such as a bacterium whose name I can't recall. The globins are not the only false example cited by Senapathy -- the immunoglobin protein fold, used in our antibodies, is found in proteins in invertebrate animals and in non-animals. Someone knowledgeable about blood proteins has already explained this in a message to sci.bio.evolution under the heading "Senapathy theory: facts wrong" or something like that.

I can fax you a sequence alignment of diverse globins, including non- animals ones.

Better yet, Jeffrey, let's just test this and settle the matter for good. Senapathy would respond to my alignment of globins by simply denying that sequence similarity is any evidence of common ancestry, and that it could instead be explained by similar "constraints." By "constraints" he must mean the selective process by which functional genomes are distinguished from non-functional genomes assembled from the primordial pond. (?) He seems to allow that selection operates only at this early stage, never later. Go figure.

Anyway, lets give Senapathy the benefit of the doubt, and proceed on the assumption that the "constraints" explanation is possible. The assumption that it is possible does not mean it is correct -- this is a fundamental logical mistake that Senapathy makes over and over and over and over and over again (e.g., it is possible that genes did not arise from small genes, but rather arose full-length from random sequences, therefore genes arose full length from random sequences). Scientists don't just assume things, they test them, and this is easily tested. We can take some alpha- and beta-hemoglobin sequences, as well as myoglobin sequences, and make an evolutionary tree based on the similarity. Or we can use tubulin sequences, or actin sequences, or whatever. Any multi- gene family for which we have lots of data.

The gene duplication hypothesis allows for gene duplications to happen at different times in the last 4 billion years, in ancestors of specific groups of organisms. The following pattern is one type of expected pattern: if humans, monkeys, rats and other mammals have alpha and beta hemoglobins, then it is inferred that both isoforms existed in a common ancestor of mammals, and the tree will show a pre-mammalian division between alpha and beta hemoglobins, with all of the alphas grouped together, and all of the betas grouped together. OK, this is a prediction of the gene duplication view, but it would not contradict Senapathy's idea of functional constraints. maybe they just group together because they are similar due to "constraints" for similar functions.

However, here is the decisive case: if we find that some isoform of globin (tubulin, whatever) is unique to one specific group of organisms. For example, if beavers have a special tail hemoglobin. This will have special constraints, so under the "constraints" model of Senapathy, it would be in a special place on the tree, separate from the other types of globins. The gene duplication hypothesis makes a very different prediction: the beaver-tail hemoglobin, unique to beavers, arose from a pre-existing beaver hemoglobin gene by duplication, therefore in an evolutionary tree it will be allied with some other beaver globin gene, within the alpha or beta or myoglobin part of the tree.

Does it make sense? Sound fair?

[ar4]

Jeff: I'm not sure what flanking accretions are....

Arlin: A gene may grow longer by sequence changes at either end, e.g., delete or otherwise alter a stop codon so that the coding region is extended at its 3' end. This is an accretion of flanking sequences. For instance, if we have random sequence with 25% each nucleotide and 3 stop codons out of 64 possible codons, then the reading frame will grow an average of 64/3 = 21 codons (std dev. 21 codons) if the stop codon is altered by a nucleotide substitution.

You probably understand it already, but the math is briefly like this. If we alter the stop codon into a non-stop codon, then the reading frame will be extended L codons, where L is the number of non-stop codons downstream. To arrive at the distribution of L, consider that the chance that the next codon (downstream of the stop codon that we just eliminated) will be a non-stop codon is 61/64, because there are 61 non-stop codons possible. The same is true for the next codon, and the next, etc. So, the chance that our little gene will be extended by a string of L successive non-stop codons is (61/64)^L. The arithmetic mean length of the string of non-stop codons will be L = 64/3 = 21. This mean value should be obvious -- if we consider a string of 10000 codons, we expect 3/64 X 10000 = 469 stop codons. Each stop codon defines the end of an RF (Senapathy counts an RF as the distance between two stops), so there are 469 RFs. With 10000 codons and 469 RFs, the average RF length must be 10000/469 = 21. The standard deviation is equal to the mean. That is, its just another exponential distribution.

The exact value for the frequency of stop codons depends on one's assumptions. For the sake of simplicity, most people (including Senapathy) just assume equal nucleotide frequencies (25% each A, T, G and C) and 3 stop codons (UAA, UAG, UGA) out of 64, leading to a stop codon frequency of 3/64 per codon.

What does Senapathy think about accretion as a means of growing long genes from short ones? You can see the answer on p. 234, in the paragraph that starts "1. We can take . . .." According to Senapathy, eliminating a stop codon "obviously lengthens the RF only very slightly, say from 200 to 210 codons, because only too soon we arrive at another stop codon. Let us try to eliminate that also, but again we face the same problem. So, even to arrive at 400 codons from a 200 codon RF by this method, we have to specifically eliminate approximately 50 consecutive stop codons".

So, we need 50 accretions, 10 codons each, to change a 200-codon RF to a 400-codon RF? Huh? Not only does 50 accretions conflict with 10 codons, but both numbers are bogus. 50 accretions to extend a gene from 200 to 400 codons would mean that there must have been 50 stop codons to remove in only a 200-codon stretch! And what's this nonsense about 10 codons between successive stop codons? When Senapathy is arguing about how easy it is for a string of exons to arise, does he assume that the average distance between stop codons (this would be the average resulting exon length) is only 10 codons? NO! Yet this is the number he uses when he wants to claim that it is hard for genes to arise without introns and exons.

Another way to grow long genes from short ones is to make internal duplications. For example, we can start with:

    ABCDEFGHIJKLMNOPQRSTUVWXYZ

and then get

    ABCDEFGHIJKLMNOPQRSTUFGHIJKLMNOPQRSTUVWXYZ

and then get

    ABCDEFGHIJKLMNOPQRSTUFGHIJKLMNOPQRSTUFGHIJKLMNOPQRSTU...
    ...FGHIJKLMNOPQRSTUVWXYZFGHIJKLMNOPQRSTU

by further duplications. Genes can conceivably grow rapidly this way (i.e., they can grow geometrically), and, in fact, many genes and proteins have internally repetitive sequences and/or structures. Under the Senapathy theory, the internal repetitiveness is not a sign of the history of the gene, but rather indicates functional "constraints" instituted by selection from the primordial pond (i.e., it needs to be repetitive in this manner in order to carry out its function). I have already explained the manner in which we can test constraints vs. duplication as an explanation for similarities of sequences.

[ar5]

Jeff: You not say that it is even possible for Senapathy to be right, that all of what we see today could come about by random means given the correct conditions (a lot of DNA and time, proper energy sources, cooperative environments)? If you are certain he is incorrect, you ought to be able to offer convincing proof, other than arguing that the facts fit the theory of evolution.

Arlin: Yes, I can say this, and I can prove it. First let me clarify something about the general structure of the theory. I find many errors and biases in Senapathy's arguments about gene structure. However, even if these arguments were correct, and if he had demonstrated that genes arose in split form from random sequences, this would not be evidence for the independent birth theory. The first genome could have arisen spontaneously, then standard Darwinian evolution by descent with modification could have come afterward. There is no necessary connection between the two.

The reason I want to clarify the above distinction is that the "independent birth" part of Senapathy's view is easily disproved without a lot of complicated arguments. For a moment, lets ignore the arguments about gene structure, which are a bit more messy. Senapathy's Independent Birth theory is formally a theory of "spontaneous generation." That is, given a set of favorable initial conditions, organisms arise spontaneously and independently. Though they may later experience minor evolutionary changes, they do not arise by evolution: they do not arise by an incremental process of descent with modification, from one or a few common ancestors.

Roughly a century ago, theories of spontaneous generation were disproved by the observation of "congruence." What congruence means is that patterns of similarity of (for example) the skulls of horses, dogs, bears and goats tend to echo the patterns of similarity of (for example) the physiology of horses, dogs, bears and goats. Congruence is also seen quite easily in molecular data. If you make diagrams indicating the similarities of sequences for tubulin, actin, myoglobin, etc. for horses, dogs, bears and goats, the diagrams will tend to be identical:

    0  1  2  3  4  5  6
    |--|--|--|--|--|--| <--- scale of increasing difference
                             (left to right)
     dog ----.               (or decreasing similarity)
             |
             |--.
     bear ---'  |
                |
     goat ---.  |
             |  |
             |--'
             |
     horse --'

for each different type of gene. This pattern of congruence is anticipated by evolution (as opposed to spontaneous generation) because of common ancestry. That is, the similarity diagram really hypothesizes an evolutionary tree. Horse and goat had a common ancestor, thus all horse and goat genes -- tubulin, actin, myoglobin, etc. -- are specifically related through their common ancestor. This pattern of congruence contradicts the expectations of a spontaneous origin theory, by which trees from different genes should have no special relationship.

I agree totally with you that people seem to be more interested in jawing about what is more "likely" than actually testing models. However, as an evolutionary biologist, I am disappointed that Senapathy's theory gets so much attention. Formally, it is as poorly supported as a creationist scheme for the origin of life. Creationism and Senapathy's independent birth theory have much in common, including the fact that both of them are refuted by congruence.

So, lets test the "independent" model against the non-independent (common ancestor) model. I'll get together some sequence alignments.

[ar6]

Arlin: I haven't forgotten that I promised to send you some sequence alignments illustrating points about evolution of genes and proteins.

Before doing so, I'd like to make sure that we both understand the point that I am trying to resolve. This point is based on comparing "phylogenetic trees," so I want to make sure that we understand the meaning of trees and the methods used to infer them. First, the methods.

Its possible to make useful phylogenetic trees for lots of things. In fact, some of the methods used by molecular evolutionists were originally developed in other fields, or are equally useful in other fields. For instance, some methods come from the study of ancient manuscripts. Before the printing press was in common use, scribes were employed to copy important codices, such as the christian bible, the torah, etc. Sometimes these scribes made mistakes or (gasp!) editorial changes. These errors and changes were propagated by subsequent scribes (one suspects that these people must have worked mindlessly most of the time). As you can imagine, it will be possible to determine the history of a manuscript by comparing its pattern of errors to other manuscripts. This, along with other clues, may allow one to infer aspects of the history of a manuscript, such as:

which other manuscript(s) it was copied from; where it was copied; when it was copied.

The first step is to align different copies of a manuscript word-for-word or sentence-for-sentence. Sometimes letters, words, sentences, or entire paragraphs may be missing, due to mistakes by scribes or to damage to a manuscript (e.g., the corner of a page gets lost from one manuscript, and all subsequent manuscripts copied from it are missing some words). This alignment will make it possible to tally up the differences between different manuscripts. Then, a map of the "distances" (measured in number of differences) between manuscripts can be constructed. This map will tend to look like a tree, or (under certain conditions) a bit like a net.

The same methods can be applied to any set of evolving things, though linear arrays (like letters in a manuscript, amino-acids in a protein, or nucleotide bases in a gene) are easiest to deal with conceptually. For instance, you might be studying the evolution of the Unix operating system, looking at ATT and BSD unices of different flavors offered by different vendors, plus Linux, freeBSD, etc. I suppose in some cases you might be able to study the actual C code, so you could do an alignment on that. In the evolution of Unix operating systems, there are some phenomena that are probably unusual in the biological world, including extensive mixing of parts, complete re-writes of pre-existing capabilities, etc.

And these methods of aligning things, tallying up differences, and making phylogenetic trees are proved to be useful in real life, in laboratory evolution, and in simulated evolution on a computer. I'll give you some examples when I write again.

[ar7]

Jeff: OK, I'm eagerly waiting. As you go, please explain why the computed genetic distances cannot also be explained by Senapathy. That is, if you are going to show some numbers supporting macroevolution, then can you also show why those numbers do not support Senapathy's theory? And, please remember that micro-evolution is not in dispute, so I'm trying to learn about things having to do with macro-evolution.

Arlin: I don't know what you mean by "macro-evolution" and "micro-evolution." These words have no set meaning, but seem to mean different things to different people in different contexts. To some people, "micro-evolution" means "allele frequency changes in a population" while "macro-evolution" often means "speciation." Please explain this when you write back.

When dealing with alignments and phylogenies, it is possible to apply identical methods to phenomena happening on vastly different time scales. For instance, it is possible infect a cell culture with a virus, split the culture into A and B and continue growth, split A into A' and A'', grow, etc. At the end of many weeks of growing and splitting, sequences from part (or all) of the virus can be sequenced and analyzed by sequence alignment and phylogenetic inference. The inferred phylogeny will tend to reflect the actual splitting of cultures into A', A'', etc.

This is over a time scale of a laboratory experiment -- weeks or months. The same methods can be (and have been) applied to the evolution of HIV over the past 50 years. The same methods have been applied to other less closely related viruses (e.g., herpes virus and HIV are distantly related) over the past few million years. The same methods have been applied to evolution of genes and proteins over hundreds of millions or billions of years.

[ar8]

Jeff: By "micro-e" I mean those effects that can be observed through artificial selection. "Macro-e" changes occur too slowly to be observed. Micro-e would correspond to Darwin's Special Theory, while macro-e would be natural selection, or the extension of the special theory to become the General Theory.

Arlin: Now I need to know what is meant by "Darwin's general theory" and "Darwin's special theory."

Jeff: This may be circular. His special theory came directly from his observations of selection and adaptation that he saw during his travels. The general theory is more broad -- it is the extension of his theory to those things he could not observe, that is, natural selection, the beginning of new species, etc.

I think...

microevolution = effects of artificial selection; observable effects
macroevolution = slow changes; natural selection; not-observed

...should be sufficient. Just forget the special/general labels since we don't need to talk about Darwin. Let's concentrate on the origins of new species having new body parts through natural selection.

Dr. Senapathy does not dispute microevolution. I don't dispute micro-e. Your mission, should you decide to accept it, is to convince me that macroevolution exists, there was only one or a few common ancestors, and that Senapathy is wrong.

[ar9]

Arlin: In general, I would recommend avoiding all of these terms, since they are not widely recognized, and will to some people mean something different. Its a bit like one person using the word "orange" to describe something that other people are calling "red" or "pink." This type of ambiguity is OK for poeticizing about flowers, but when any serious issue is at stake, it would be better to take the time and effort to speak in terms of wavelengths.

Caveat emptor, lets agree that

microevolution = effects of artificial selection; Darwin's special theory macroevolution = slow changes; natural selection; Darwin's general theory

[ar10]

Jeff: Dr. Senapathy does not dispute microevolution. I don't dispute micro-e. Your mission, should you decide to accept it, is to convince me that macroevolution exists, there was only one or a few common ancestors, and that Senapathy is wrong.

Arlin: I think we need to back up a bit for this. I think that what you want to know relates to the fundamental questions that were addressed in the 19th and early 20th centuries in an attempt to distinguish Darwin's account of the origin of species from other accounts. The fundamental concepts that are needed here are not "microevolution" and "macroevolution" but:

immutability (constancy of species) transmutation (one species turning into another species) descent with modification (features of a species changing over time) common ancestry (origin of two different species from an ancestral species) the argument from design (things fit because they were designed to fit) natural selection spontaneous generation (some species can arise from non-living matter)

I'm going to spend the next few paragraphs attempting to review, in a very general way, what these issues mean and how they were resolved (to the extent that they have been resolved) in regard to the origin of species. You are free to question the facts that I imply and to doubt my conclusions -- I'm not trying to convince you right now, just to outline a conceptual framework that we can both agree to.

Here are three basic ideas to explain the origin of the many species that inhabit the earth:

I. the garden: special creation. each species was separately designed, and planted on earth by an intelligent being with the requisite resources and capabilities;

II. the ladder: spontaneous generation + transmutation. simple species, such as worms, arose separately under conditions favorable for their origin; complex species arose from simple species

III. the tree: common ancestry + descent with modification. from one or a few common ancestors, species arose over long periods of time by modification and splitting.

In designing one's own theory of the origin of species, it would of course be possible to mix different parts of each of these (e.g., god planted the first few species, then became lazy and stopped tending the garden, which then evolved into a variety of other things), but for the sake of simplicity I will consider only these three.

Transmutation of species is a deceptive concept that is different from creationism and from Darwin's account of evolution. Few people understand this concept today, since it depends on the archaic "ladder of life." More specifically, the original idea was literally conceived in terms of turning one known species into another known species, i.e., a human evolving from a chimp, a dog evolving from a rat, etc. The "lowest" organisms on the ladder of life could arise spontaneously (snakes and worms out of mud; clams and lobsters from the bottom of the ocean, etc.). "higher" organisms could arise from "lower" organisms through transmutation.

In creationism, species are immutable. Their basic features do not change, although they may undergo slight modifications. In creationism, unlike the other two theories, there are no hereditary or historical relationships between organisms (that is, no ancestor-descendant or shared-ancestor relationships).

The "rationale" of species and of their adaptive features is different for transmutation, creation and evolution. In creation, species are designed to live harmoniously, as in a well-tended garden, because this pleases the designer, who is obviously a creature of refined taste. In the transmutation theory, God, angels and humans are at the top of the ladder of life, and (of course) all other organisms have an in-built wish to climb the great ladder. This force of progress is what causes organisms to transmutate or evolve. In Darwin's account of evolution, species are mutable, but one known species does not turn into another (instead, an ancestor gives rise to descendants). Adaptation results from natural selection acting on heritable differences in the ability to survive and make use of available resources.

The spontaneous generation/transmutation theory has fallen into disrepute for a variety of good reasons, including:

the demonstration that mice do not spontaneously arise from boxes of rags, that flies do not arise spontaneously from rotting meat, and finally (only in the last century) that bacteria do not arise spontaneously from broth; the evidence that species and other readily identifiable groups of organisms typically did not arise multiple times from "lower" species, but instead arose only once; the continued failure of theorists to find a testable mechanism to explain the "entelechi" (the life-intelligence or life-force) that drove organisms to transmutate and thus to climb the ladder of life (the longest-lasting and most reasonable suggestion was Lamarck's).

The creationism theory has also fallen into disrepute mainly because of the overwhelming evidence for historical relationships between organisms. Instead of being created separately, organisms share common features. The distribution of common features is fundamentally hierarchical (like the tree, not like the ladder or the garden). [By the way, I mean the scientific theory of creationism. As rational scientists, we are free to consider the idea that life exists elsewhere in the universe, and that earth might be a garden planted by some other form of life -- a garden full of organisms that were uniquely designed to work together and to take advantage of the earth's special geochemical features. If you know any gardeners, then you know how carefully they can plan things. This is OK science. By contrast, Biblical creationism -- with its 6000-year time scale, its claim that dinosaurs walked side-by- side with humans, that all fossils were created in one great flood, etc. -- is bogus, anti-rational, non-science. end of sermon. ].

Are we together, so far?

[ar11]

Arlin: (reprise) I think we need to back up a bit for this. I think that what you want to know relates to the fundamental questions that were addressed in the 19th and early 20th centuries in an attempt to distinguish Darwin's account of the origin of species from other accounts.

Jeff: Maybe, but I think things are getting a bit too complicated. What I'm looking for is the evidence that supports the tree and which simultaneously refutes Dr. Senapathy's theory.

Arlin: Yes, this is the point that I was hoping to reach. The evidence that supports the tree simultaneously refutes creationism and Senapathy's account of species origins, because both schemes rely literally on independent birth of species, while the features of organisms are radically non-independent, exhibiting a distribution of similarities that is hierarchical or tree-like.

Jeff: Not too fast. Although a clear tree distribution is required to prove descent-with-modification, I don't agree that evidence of a hierarchy will automatically refute Senapathy. Since his theory allows reuse of genes, similarities are predicted. If the tree is near perfect, that might refute Senapathy, but I'm not sure because the ratio of new to reused genes in the Senapathy theory is not a specified value.

[ar12]

Arlin: I guess we still have a little more discussing to do. I want us to agree to commit to a logic of interpretation before we start seeing results. Nothing in biology is "perfect," and few things are "near perfect." The best that I can do is to seek statistically significant levels of what is called "phylogenetic structure" and (also) statistically significant levels of congruence. I claim that such results, if they are found, would refute independent birth (whether in the form of creationism or Senapatheism).

"Phylogenetic structure" means that the relationships are fundamentally tree-like, rather than superficially treelike, due to noise. The fundamental structure of a tree for random sequences is called a "star phylogeny" -- a bunch of rays emanating from a single point. However, a random set of sequences can still have a tree-like phylogeny, even if the tree is not significantly different from a star-like phylogeny.

For instance, the following five sequences are random in the sense that I just opened a computer file and picked them from separate parts of the Haloferax volcanii genome. They therefore have no expected relationship to each other (except that they have roughly the same nucleotide composition):

    1. TTCGCTGGTCG
    2. GTCGGGACGGT
    3. CAGTCGATGGG
    4. GACCGACGAGA
    5. GGTCCGTCTGG

Yet we can still make a tree. When I try to find the best parsimony tree for these data using a Mac program called PAUP (phylogenetic analysis using parsimony), what results is, out of 15 possible trees, the 8 "best" trees have 24 steps, while the 7 "worst" trees have 25 steps. The "best" trees are not significantly different from the "worst" trees (and all possible trees are either in one of these two classes)! This is the kind of data that we can get from a random draw; the kind of data that we can get from Senapathy's model.

How does re-use of genes fit into this?

[ar13]

Jeff: I'm a bit hesitant to agree in advance to too much because you have an advantage in this matter. You already have knowledge of the data you are going to present that supports or refutes these theories. I don't want to box myself into a corner. I think we already have a rough agreement to look at the data and see how it fits the two theories.

I have no doubt that you are fair and honest, but since you have the advantage of knowledge and expertise in this matter, I should at least retain the ability to be skeptical at each step and make you justify things as we go. That's balance. For example, in trials (yes, I'm watching O.J.), the prosecution has the burden of proof, but they also get the last word in closing arguments.

Arlin: I may have an advantage on specialized knowledge of this area, but presumably we are equals with respect to logic. Unless Senapathy's theory is a religion for you, then (at least, in principle) you are prepared to admit that there might be data that contradict it. The logical problem -- actually, it is more of an imaginative problem, since most people have a mental block against imaging possibilities that conflict with their views -- is to envision data that might be found (i.e., from practical experiments) that would contradict major aspects of the theory.

Some contradictions are obvious, but very impractical. For instance, if we could transport ourselves back in time to the period corresponding to the Jurassic, we will either find humans or fail to find them. According to Senapathy's theory, humans and dinosaurs both arose from seed cells created during one period from the primordial pond, therefore we should find humans and dinosaurs living side by side. According to Darwin's view, we would see dinosaurs in the Jurassic, but not humans, monkeys, or apes. Finding humans with the dinosaurs would contradict Darwin's theory. Failing to find them would contradict Senapathy's theory.

However, I'm not going to expend an entire tankful of time-machine fuel to take you back to the Jurassic with your "wait and see" attitude. As prosecuting attorney, I'm not going to take O.J. to trial if I suspect that the judge and jury consider a DNA test showing victim's blood on the defendant's hands, vehicle, and knife as ambiguous evidence open to a variety of interpretations. I risk wasting a lot of time and money, possibly ruining any chance for a conviction. My first objective is to make an agreement with the jury as to what kind of practically obtainable evidence (i.e., something less than a videotape of the murder) would constitute damning evidence.

(reprise) How does re-use of genes fit into this?

Jeff: Well, let's say the pond spits out two seed cells at the same location within, say, a second of each other. I'd expect that the two would be nearly identical. Let's say both are viable, and that one is an ape and the other is a human (or any two genomes with very similar construction that would appear close or directly connected on any tree). If you compare the genetic structure, they will be nearly identical. How will the differences be able to distinguish descent with modification outside of the pond vs. reuse of a genome within the pond? That's seems like the central question to me.

Arlin: Thanks for explaining this, though I can't say I see how the pond produces seed cells. I'm not sure that this is what Senapathy intends, since this is no longer "independent birth." You are suggesting a temporal or spatial heterogeneity of information in the pond such that the genomes of two organisms are assembled non- independently. This heterogeneity arises because genomes are apparently replicated in the pool from previous genomes, thus two genomes existing at time t may have arisen from a common ancestor at time t-1. That is, now you have managed to incorporate processes of common ancestry into Senapathy's theory. Is there also descent with modification in the pond? That is, were the human and gorilla genomes assembled side-by-side in the pond identical at every site, or did they differ by nucleotide substitutions or did they differ by inter-mixing with other sequences from the pond?

[ar14]

Arlin: (reprise) Thanks for explaining this, though I can't say I see how the pond produces seed cells. I'm not sure that this is what Senapathy intends, since this is no longer "independent birth."

Jeff: He treats the "birth" event as the point at which an organism becomes "alive." That would be when it can exist and reproduce on its own although the organism might continue to live in the pond as most did initially). When does life start? (If you are a Rush Limbaugh fan, you will recognize that issue as being illustrated by the question: when does a telephone call start? Does it start when you pick up the phone to make a call? Or, when you start or finish dialing? Maybe it's when the other end rings. Or, is it when the other end answers?)

Arlin: (reprise) You are suggesting a temporal or spatial heterogeneity of information in the pond such that the genomes of two organisms are assembled non-independently.

Jeff: Correct. Senapathy asserts that all genome could contain parts of previously-made genomes. This would be random, except for the fact that viable genomes would be more prevalent (because they can multiply on their own and eventually fall back into the gene pool) and hence parts therefrom get reused more often. Incidentally, the Burgess Shale organisms fit in very well with Senapathy. That was a separate primordial pond with its own unique set of genomes and common body parts. On page 328 Senapathy quotes from Gould's Wonderful Life: "Each [organism] seemed to be built from a grabbag of characters -- as though the Burgess architect owned a sack of all possible arthropod structures, and reached in at random to pick one variation upon each necessary part whenever he wanted to build a new creature."

Oh, and there's Ediacara, too (page 501).

Arlin: (reprise) This heterogeneity arises because genomes are apparently replicated in the pool from previous genomes, thus two genomes existing at time t may have arisen from a common ancestor at time t-1. That is, now you have managed to incorporate processes of common ancestry into Senapathy's theory.

Jeff: That's the way I understand the theory, and it is a common misunderstanding of those people who have criticized the theory by saying: "Senapathy is wrong because there are similarities; because there is a tree." Senapathy's theory explains the same data that supports descent with modification from a common (living) ancestor. He explains all the wild diversity, too.

Arlin: (reprise) Is there also descent with modification in the pond?

Jeff: Well, yes, of a sort. It is descent with modification, but from non-living ancestors.

Arlin: (reprise) That is, were the human and gorilla genomes assembled side-by-side in the pond identical at every site, or did they differ by nucleotide substitutions or did they differ by inter- mixing with other sequences from the pond?

Jeff: The differed by random substitutions and random mixing with the sole requirement that the result was viable (and could reproduce to live more than one generation). Senapathy does not take on the subject of ape/man in his book -- he uses other organisms in most of his discussions. I suspect he intentionally avoided using man as an example in his theory -- the "Darwin was wrong" aspect was controversy enough. However, genomes came out of the pond as Senapathian seed cells after undergoing an in-pond, random assembly of various genes. The organisms we see today (plus those that are extinct) share two very important attributes: they were all viable and they can (or could) all reproduce. This would automatically mean that there must be many similarities right from the start. Keith Robison made a minor stink over this because Senapathy's theory seems to predict anything. No, it seems to predict all of the same data that we now attribute to Darwin's theory. So, I'm thinking: maybe that's because Senapathy is correct. It sounds plausible to me, but I'm asking you to convince me otherwise.

[ar15]

Jeff: I would like to be more definite and say "if you show me so-an-so, then I will conclude that Dr. S. is wrong," but I cannot speak for Senapathy -- I don't have an intimate knowledge of his theory, and I've only read his book once. So, it wouldn't be fair for me to specify such a test. However, if you show me something that raises serious doubts in my mind that I cannot resolve, then I'll go to Dr. S. and say "how about this?"

Arlin: You won't be speaking for Senapathy, but for logic or (if you care to think about it this way) truth. Forget Senapathy. Forget Darwin. Theories are not "theirs" to possess, and they are less likely to be objective than we are.

If the theories proposed by Darwin and Senapathy have compelling logical structures, then presumably we can figure out their implications with our own heads. I have never read Origin of Species cover to cover. I have read parts of it, and I might read the whole thing some day, but the point is that it is not necessary. Nor have I read Senapathy's book cover to cover. I don't have to. To the extent that Darwin's ideas were logical, I can reach them independently. Wallace came up with many of the same ideas. There was a man named Stevens (?), a forester, who came up with the idea of natural selection 20 years before Darwin published it. To the extent that Darwin's ideas were illogical, patched together, products of his social milieu, etc., I cannot infer, nor do I care to.

I can see the inspired logic of Senapathy's theory. A creationist says to an evolutionist "How could an animal, with all of its complexity, its adaptation, its inter-relationships, its beauty, etc., JUST HAPPEN?!? BY CHANCE? This would be like the wind sweeping through a junkyard and spontaneously assembling a 747!" The evolutionist responds "Yes, this would certainly not happen, or would happen so rarely as to be negligible. However, the analogy is not applicable. Animals are not proposed to have evolved in one step, but instead sequentially, through many steps, with a sorting process operating at successive steps along the way." Senapathy would respond to the creationist by saying "Yes! Indeed, it would take a fierce wind and an incredibly huge junkyard to allow for the spontaneous assembly of a 747, thus it must have taken a huge pool of primordial DNA to allow the spontaneous and independent assembly of organisms." The evolutionist solves the apparent paradox by imagining a huge span of time and a set of processes that operate continually throughout that time, allowing an incremental process of change to occur, building up organisms that would be incredibly unlikely to arise spontaneously in one step. The spontaneous-generationist imagines a pool of ur-muck so huge that even complex creatures become likely to rise out of the ur-muck spontaneously.

Senapathy's initial inspiration was the notion that anything was possible given a big enough pool of DNA sequences. This inspiration by itself is easy to refute by, for instance, the non-randomness of gene structures (as I described in Science). Its funny that the power of this inspiration carries Senapathy onward, even as he is emasculating his own theory to satisfy critics. You are doing the same thing. In your hands, this original inspiration has become much harder to refute, because it now includes mechanisms analogous to time-tested evolutionary concepts such as natural selection, common ancestry, descent with modification. For instance:

Senapathy's original inspiration was that genomes were truly random, not shaped by natural selection. What a revolution! But now you see it is necessary to have some process of selection by which only good, functional genomes are "expressed" in terms of "seed cells." One important difference between Senapatheism and evolution is that in evolutionary theory there are population genetic models used to describe the effects of differential reproduction in populations. In Senapathy's theory there is no comparable model to describe how selection operates in the primordial pond. Instead, it is simply asserted that this selection operates to produce effects similar to those expected from population genetics, without admitting the value of pop gen. Senapathy's original inspiration was that organisms arose independently, but now you see that it is necessary to invoke common ancestry, so that some organisms can be considered more closely related to others (e.g., gorilla and human). Senapathy's original inspiration was that organisms arose all at once, by the same process at essentially the same period of time. But now you fear that this will prove radically inconsistent with two observations: evolutionary trees of organism features (e.g., gene sequences), which show different degrees of relatedness between organisms (not just two degrees of relationship, near-identical and completely unrelated) and the fossil record, which shows different organisms living at different times. In order to satisfy these problems, you are going to have to propose that descent with modification occurs inside the pond, and that the pond continues to spit out new organisms for billions of years.

You are in danger of molding a theory to be completely consistent with most claims of standard evolutionary theory. Where is the revolution? Give me something that I can sink my teeth into. A prediction that is potentially falsifiable.

Jeff: (reprise) The differed by random substitutions and random mixing with the sole requirement that the result was viable (and could reproduce to live more than one generation).

Arlin: This is what I mean by lack of a mechanism. How does selection (choosing what "works") actually happen in the primordial pond? Consider two random sequences in the primordial pond. One of them, gene A1, looks just like the human alpha hemoglobin gene, with two introns and three exons, encoding 144 amino acids. The other sequence, gene A2, looks like random junk. Now, tell me, how is it that gene A1 ended up in the human genome, while gene A2 does not? I say that they have an equal chance of ending up in a seed cell, and that there function cannot be tested. Please note that hemoglobins of all types carry oxygen in the blood, lymph or tissues. The hemoglobin alpha chain gene is not expressed in zygotes, embryos or fetuses. Therefore, in a seed cell, there is no way to distinguish between gene A1 and gene A2 with regard to whether or not "the result was viable." You can't test the function of a hemoglobin gene without lungs, blood, a heart, etc. Seed cells don't need hemoglobin to survive. Seed cells also don't need neurotransmitters, keratin (used in hair and fingernails), genitals, and many other things that adults need.

You said that you weren't prepared to commit to a falsifiable prediction, but you hoped that I would provide evidence that might raise serious doubts in your mind. Many people have pointed out that the detailed basis of Senapathy's claims do not hold water -- e.g., his know- nothing statements to the effect that vertebrate blood proteins are never seen in invertebrates, his claim that exons show an exponential size distribution, his assertion that intron-less genes could not have arisen spontaneously, etc., etc. These did not raise "serious doubts" in your mind, because (apparently) you thought the fundamental concept was sound. I would like to suggest that the fundamental concept can only stand when it is propped up by standard evolutionary theory in an increasingly twisted way. To put this another way, when an intelligent person tries to defend Senapathy's theory, I suspect they will end up making it look a lot like evolution.

If you continue to ponder trees, you will eventually insist that descent with modification must be an ongoing process in the pool, that genomes must evolve in the pool as separate units (rather than being continually pieced together at random) and that the pool continues to spit out organisms, such that humans arose 2 million years ago and are indeed the relatives (through the pool of genomes) of gorillas, which arose 5 million years ago; and these great apes are (through the pool of genomes) even more distant relatives to the lemur, which arose 50 million years ago; and these primates are (through the pool of genomes) even more distant relatives to a fish, which arose 400 million years ago. All of this and more will be necessary to reconcile the pool concept with phylogenetic trees and with the fossil record.

Continued in Part II and Part III.

The Stoltzfus Chronicles (Part I)

Topics [article number]:

The Stoltzfus Chronicles
(Part I)