Replies to Senapathy Q & A Number 2


[qa2-1]

From Keith Robison: Senapathy completely ignores Arlin Stoltzfus' argument [about dendrograms allowing us to distinguish common ancestry vs. independent birth]. To summarize the argument:

  1. Sequence families identifiable by computer show variation (Senapathy does not dispute this point) "These normal sequence variations are those that evolutionary molecular biologists now use to 'construct' a misleading phylogenetic tree."

  2. One can build a tree which relates the similarity between sequences (again, this is just math -- not up for dispute),

  3. (The critical point). The two theories being debated here make two predictions. If Senapathy is right (genes were drawn randomly from a pool), then there should be no consistent relationship between two trees. That is to say, the topology tree for rRNA should be useless (no better than guessing in predicting the topology of the tree for DNA polymerase, and both will be useless in predicting the topology of the tree for methionine tRNA synthetase (and so on).

    On the other hand, if organisms are descended from a common ancestor, then the topology of one tree should be congruent with the topology of every other tree, save for the limits of the technique and low-frequency confounding phenomena (e.g. horizontal transfer).

So, as Stoltzfus noted, a single tree cannot distinguish but multiple trees can. There is, of course, a large number of sequences from which to conduct such an experiment, or you can cheat and use the large number of trees which have been published. Each new independent tree which agrees with the others tips the scales farther towards common ancestry, and the scales are quite tipped in that direction already.


[qa2-2]

From Keith Robison: Senapathy claims that the taxon "Mammalia" is an artificial grouping of independently derived lines. While he isn't specific about how many independent lines, he explicitly claims that monotremes (egg-layers), marsupials, and eutherians (placental) mammals are independent. If we look at the chromosomes of mammals, we see that there is a great degree of synteny (conserved chromosomal ordering of genes). Furthermore, it is likely that some of this synteny extends beyond the mammals and into other vertebrates.

Why do biologists care about all this? Because it leads to useful hypotheses. For example, Senapathy basically claims that cloning genes by sequence similarity is a fad and not inherently informative. I offer in contrast

      Detecting conserved regulatory elements with the model
      genome of the Japanese puffer fish, Fugu rubripes
      Aparicio et al.  Proc Natl Acad Sci U S A  1684-1688 (1995)
Aparicio et al cloned a hox-type developmental gene from Fugu, the notorious Japanese delicacy. Comparison of the upstream non-coding regions revealed a few islands of sequence similarity. Transgenes containing these segments were put into mice, and the segments directed specific expression of reporter constructs in mice. In other words, sequence similarity implied functional similarity, which was demonstrated in vivo. The underlying logic behind this is that the entire system (sites, Hox gene, DNA binding protein) in both organisms has a common origin. There is no particular reason to expect these results given Senapathy's hypothesis -- there are thousands of possible recognition sequences for transcription factors, and no particular reason that Mouse Hox and Fugu Hox should use the same ones.

Dr. S: I have discussed at length how homeobox genes can be used as common reagents in the development of distinct and unrelated organisms by the inclusion of such genes in distinct genomes being assembled from a common pool of genes. I have also gone into detail as to how the homeoboxes are short DNA sequences and their probabilities are high, and how sequence similarity leading to functional similarity can arise in totally independent random sequences in the primordial pond. It is very important to note that in the homeobox containing proteins, only the homeodomain (which are short) are similar, and the other long regions of proteins (other domains) are very distinct.

Keith: You are avoiding the point of my example. There is an enormous variety of potential DNA sequence which could be bound by the enormous variety of potential homeobox protein domains. In other words, there is no particular reason, under your theory, to expect two homeobox proteins with the same general function to bind functionally equivalent DNA sequences.

Dr. S: Again, as I have said before, my theory does accommodate the molecular synteny among groups of organisms. My theory says that during the independent assembly of genomes from a common pool of genes in the primordial pond, genome mixing can occur, once the genomes started to form. Also, once the genomes were being formed, they could change into slightly changed genomes and could give rise to changed organisms, but with similar genomes. This phenomenon of genomic repatterning and restructuring, and mixing between distinct genomes could lead to distinctly new genomes but with large portions common to the different resulting genomes.

So you are arguing that whole chromosomes are recycled, but somehow those "impossible" levels of mutation occur so that the genes don't look quite identical?

Please note that all these processes could have happened within a small primordial pond, like in a closed vessel. This will lead to organisms that were independently arising from the otherwise independently assembled genomes, but which can have varying degrees of chromosomal synteny.

Exceedingly unlikely. Chromosome-sized DNA fragments are extraordinarily fragile due to the fact that they are basically a very long and thin strand with a chemically-sensitive backbone. DNA is sensitive to UV, acid, and mechanical stresses, as well as nucleases which are essentially omnipresent once any life is around. But that is an objection for another post...


[qa2-3]

From Andrew MacRae: (quoting Dr. Senapathy) "Please note that all these processes could have happened within a small primordial pond, like in a closed vessel. This will lead to organisms that were independently arising from the otherwise independently assembled genomes, but which can have varying degrees of chromosomal synteny."

"This fits well with the fact that all these organisms appear fully formed in the fossil record, and the fact that they have never changed from their original state of appearance in the fossil record until today."

Andrew: You still have not fixed this misrepresentation of the fossil evidence, which I presume mainly refers to the Cambrian explosion. I composed a fairly lengthy post in response to your earlier article explaining why your characterization of the Cambrian explosion and subsequent evolution is factually incorrect.

The pattern you describe is not the pattern in the fossil record. Major groups of organisms (for example the Cnidaria and land animals and plants) appear before or after the Cambrian explosion, and organisms do change significantly before, during, and after the event. The Cambrian explosion is a period of relatively rapid evolution, but it still has a succession of faunas and at least some of the apparent "rapid" evolution may be due to the preservational changes that occurred as organisms developed mineralized skeletons. Despite these problems, you continue to use an incorrect characterization of the fossil evidence as support for your theory. Why did you not respond to the original post and address this problem? I realize your theory deals mainly with genetics, but you do devote significant effort to the presentation of the fossil evidence, and you often mention it as supporting your conclusions.


[qa2-4]

From Keith Robison: (who begins by quoting Senapathy who is quoting Robison quoting Senapathy) "Even molecular evolutionists know full well that partially new genes en route to evolving fully functional new genes (called incipient genes) have no selection value in evolution and are not preserved, so only fully formed genes could be selected. This, as even Bernd-Olaf Kuppers puts it, is an unsolved and unsolvable problem for molecular evolutionists."

Keith: You should look up the Drosophila gene jingwei in Flybase. Jingwei is a new gene, formed by a retrotransposition event. Jingwei is a chimaera between alcohol dehydrogenase and another gene. The molecular function which jingwei has assumed is not known...

Under such circumstances, it is simply improbable to evolve a new gene by tinkering with a duplicated gene within the short time-frame in which the distinct organisms are said to have evolved.

See jingwei. Also, the most likely scenario for a duplicated gene remaining in the genome is if it assumes a different role. There are several plausible mechanisms for this, and I will detail 2.

  1. Suppose the parent gene is expressed in two different tissues, and this expression is guided by specific controlling elements for each tissue. In other words, enhancer A drives expression in tissue A and enhancer B drives expression in tissue B. Duplication of the gene results in two copies. If one loses function of one enhancer, then the other copy loses selective pressure on the opposite enhancer. In other words, a duplication event can lead to a splitting of responsibilities between the two duplicated pieces.

Please note that Keith simply proposes a scenario, and then simply assumes it to be correct. From where does Keith derive the enhancers in the first place?

(end of the included quotes)

Keith: They are there!! I am arguing that given a gene with two different tissue-specific enhancers, a duplication event can be followed by reciprocal loss of enhancers in each copy, resulting in two tissue-specific genes. Please tell me which of the underlying assumptions is false:

  1. Eukaryotic genes can have multiple tissue-specific enhancers
  2. Enhancers can be inactivated by point mutation
  3. Genes can be duplicated.
Dr. S: His many "if" statements are simple propositions, but the last sentence is a conclusion based on these propositions and assumptions, without any corroborating validation. Unfortunately in evolutionary discussions we tend to do that a lot.

Keith: What a terrible thing -- to propose a model which suggests experiments to test it. I'm afraid it happens constantly in science.

(more included quotes again)

  1. (above)
  2. Transposons. Senapathy is quite clear in his feelings towards the potential role of transposons in evolution. He titles one section "Analysis of an example organism: Mutations of the fruit fly indicate that transposons can have no evolutionary contribution."

    I contrast this bold statement, and the rhetoric which tries to back it up, with the recent publication "Transposon-induced promoter scrambling: A mechanism for the evolution of new alleles Kloeckener-Gruissem and Freeling." Proc Natl Acad Sci U S A 92: 1836-1840 (1995).

    The authors describe a complex transposon-induced rearrangement in Maize which leads to alteration of the expression pattern of the alcohol dehydrogenase gene. The gene's expression is increased in some tissues, decreased in others, and remains the same in still others.

In this and jingwei we see two different experimentally-demonstrated routes to genes moving into new developmental pathways, something Senapathy is certain is impossible. In one (jingwei), retrotransposition of a gene has resulted in a hybrid gene with the expression pattern of one parent. In the other (Maize adh), the developmental expression pattern of a gene has been altered by a transposon-induced DNA rearrangement.

Dr. S: The reality is that the set of genes of any organism is essentially constant.

Ms Jingwei would respectfully disagree with you on this.

Here Keith is talking about an outcome whose cause is well known, but which outcome has nothing to do with evolution -- except that he believes it to be so. As I have discussed in my previous posts, many normal DNA mechanisms such as the DNA recombination can sometimes lead to erroneous combinations. Sometimes intragenic recombination can occur due to unequal crossing over, resulting in chimeric genes. Such genes are either deleterious to the system, or are neutral and will be randomized. They are neither caused by any evolutionary mechanism, nor have any evolutionary consequences.

[snip]

The two alleles Kloeckener-Gruissem and Freeling have led to only slight variations in essentially the same Maize organism. I have explained many such incidental variations in my book. In the case of jingwei, as Keith has rightly pointed out, we do not know the function of the chimeric gene. It is possible that it is simply the result of a random recombination event, and such random recombinants have no function.

(end of the included quotes)

Keith: Except it is under selection! Population genetics analysis shows that jingwei has an imbalance of synonymous vs. nonsynonymous substitutions in Drosophila isolates. So we can infer that jingwei has a function, but we do not understand it yet. So weave detected the birth of a new gene!


[qa2-5]

From Keith Robison: (reprise) Senapathy argues quite forcefully that the mutations required to change one genome into another are simply not possible: that at measured mutation rates not enough time could ever pass to change an extended region of DNA by even a small amount (say 10-20%). Rather than trying to argue how mutations fix or at what rate, we can go to some data. Suppose we look at fruit flies. There are many species of fruit flies, and even Senapathy's theory would claim that they have a common ancestor. Drosophila melanogaster and D.virilis are two fruit flies, and we can compare the DNA sequences between them. Strikingly, what is found is that most sequences which do NOT code for protein or RNA show essentially no resemblance to each other. In other words, an impossible (according to Senapathy) number of mutations has occurred! Since the data is real, there must be a flaw in the logic which declares this impossible.

Dr. S: Yes! I do argue quite forcefully that mutations required to change the genome of one distinct organism into that of another distinct organism are simply not possible. Now, what Keith proposes is that we simply ignore all that we have learned about mutation rates and come to some data, which is in fact pointed by a special case.

No, not ignore "all we know about mutation rates" -- only your multiply-flawed derivations from it (see below).

The example of the Drosophila is probably caused by a transposon element such as the P elements, which are known to cause drastic changes within the genome, but without changing the coding regions of the genes. This phenomenon is known as "Hybrid Dysgenesis," which phenomenon cannot contribute anything to evolution. I have described this phenomenon in my book (pages 116-117 in Chapter 4).

Hybrid dysgenesis (HD) can't save you here. HD is the phenomenon of many mutations occurring if a transposon-bearing line is crossed to a line lacking that transposon. The transposons hop like mad, causing insertional mutations. Of course, the insertions in these mutations are transposons, so if the cross-species variation in question were due to HD, then we should be able to recognize them as transposons. We could do this in several ways -- looking for the characteristic structure of transposons, comparing to cataloged transposons, searching for transposase-like coding regions, or comparing the various intergenic regions against each other. Since your argument is that DNA cannot mutate by 10% over biological time, any transposon should remain preserved. Also, your argument is undermined by the fact that we can find regions of similarity -- and that these correspond to functional motifs. HD clearly doesn't explain the divergence in non-coding sequences between different Drosophila species.

In essence what Keith says here is that "Senapathy says that mutation rate is limited, but see here in this case of fruit flies, there is a great deal of mutation within the non-coding regions."

Indeed, because if theory (your calculation) doesn't agree with observation, then we must re-examine the theory. And your calculation is horribly flawed on multiple levels.

Senapathy's calculation goes like this (p.34-38)

Suppose we define two genomes as different if they differ by 10% in sequence, can point mutation generate such difference in reasonable time?

Let us take a DNA sequence, 10 nucleotides long, and let mutations happen in it randomly...The probability that a mutation occurs at a given position, say the 4th, is 1/10. The probability that at that position the A is changed to G is 1/3...Therefore the probability that at the 4th position the A is changed into G in the given 10-nucleotide sequence is 1/10 x 1/3 = 1/30. ... the probability that the A at the 4th position is changed to G, and the C at the 9th position is changed to T is 1/30 x 1/30 = 1/900".

Senapathy has committed his first error. He is claiming to be trying to prove that no genome can change by 10%, but he is basing the calculation on a particular 10% change. Using similar logic, I can prove that any bridge hand will never occur, or using a few assumptions of population genetics, that neither Senapathy nor I can possibly exist. The odds are just astronomical!

And, of course, Senapathy has stacked the deck -- he is implicitly claiming that there is only a single set of changes which will distinguish two genes from each other functionally.

Senapathy's other error (of the two I've discovered in this particular calculation) is much more subtle but in some ways more impressive.

"Extending this computation, the probability of mutating a 100-nucleotide-long gene at a given position is 1/300, The probability of mutating this gene at two given positions is (1/300)^2...at ten given positions is (1/300)^10 ... Therefore, if a gene is 1000 nucleotides long and requires specific nucleotide changes at 100 positions to change this gene into a new gene, then the probability to achieve this is (1/300*)^100 or approximately 10^-350 [* -- the book has 1/3000 here, presumably a typo]. [JM: it's not a typo, see the point mutation debate.]

Mutation rates are supposed to be in the range of 10^-9 to 10^-6 per nucleotide per generation in animals. Even assuming a high mutation rate of 10^-5, a genome of approximately one billion nucleotides would have a maximum of about 10,000 nucleotide changes per generation"

Here Senapathy goes right past an escape door from this awful calculation. It is always a good idea to use a simple but inexact method to estimate the neighborhood your final value should show up in. If a genome mutates at a rate of 10^-5 mutations/generation, then after how many generations would we expect every nucleotide to be hit by a mutation -- 10^5 generations!

"...as we have seen, a typical gene may require 10^350 such mutations before achieving a specified 10% change. Even if each generation only lasts one year, it would still take 10^350 years to achieve this."

Again, our check suggests that it is more in the neighborhood of 10^5.

"In other words assuming that a specific 10% change of the genome would convert one creature into another, it would take about 10^350 years to achieve this ... Compare this to the age of the earth ... less that 5x10^9 years"

And here Senapathy goes for the ultimate freshman chemistry error. Even if we accepted his probability of 10^350 with all its dubious assumptions, he's bungled again.

Let's go back through the calculation symbolically. The probability of a mutation occurring is, of course, the mutation rate (m). The probability it is the desired mutation is 1/3m The probability of two specified mutations occurring is (1/3m)^2, and the probability of n specified mutations is (1/3m)^n. And that value is the number of generations required, right?

WRONG!!!! m is in units mutations/generation (g). Every freshman chemistry and physics class teaches the dangers of leaving the units out of the equation. So let's put them back in

      (1/3 m [1/g])^n 

  or  1/3 m^n / g^n
So in Senapathy's example, you need to divide the value 10^350 by (10^5)^100. How odd -- suddenly this is a probable event.

So Senapathy's "proof" that point mutation cannot create sufficient variation is completely wrong. DNA (and therefore proteins) can mutate by enormous amounts in reasonable time. This suggests one possibility for the proteins which he claims are unique to particular clades -- that they are so mutated in respect to the extant relatives that we can no longer distinguish the similarities (there are, of course, other possibilities -- such as new genes being born from mutation-randomized non-coding DNA).

So on the molecular side* we have some important evidence in favor of a common ancestor. As Arlin Stoltzfus nicely demonstrated, and I reiterated in the face of Senapathy's dodging, molecular phylogenies provide the evidence of a series of divergences between species. The corrected calculations above show that known mutation rates are sufficient to generate the observed molecular diversity.

But can Keith answer this question: After the great many number of mutations did the organism Drosophila change into another organism? No! It remained the very same organism, and changed only to a very similar species. With these many mutations, did the coding regions of genes (i.e. the proteins) change into new proteins?

Since the evidence of the multiple phylogenetic trees, suggests that such changes must have occurred, we now know to look for them (just as my rough estimate told me to look for gross errors in your equations). They will be quite hard to find -- but as the example of jingwei showed, it is possible to identify new genes which have arisen during evolution, and the regulatory changes which have resulted in new forms.

I ask you this. Humans and chimps have extremely similar genomes, and I doubt even you would attempt to claim they sprung to life independently. Yet chimps and humans show a number of key differences which must be genetic, and are quite extreme. For example, chimpanzee females develop a large swelling on their buttocks during reproductive heat. Where did those novel genes/pathways come from? Are you going to claim that is "normal" variation?

* -- I am not well versed in paleontology, and wouldn't dare try to make arguments there.


[qa2-6]

From Keith Robison: (reprise) What is this theory? It involves the probability of finding contiguous open reading frames (ORFs) in DNA versus the probability of finding them if one can splice regions together. We can see the gaping flaw in Senapathy's logic quite simply in one of his examples....

Dr. S: "Keith Robison has again shown that he has not read the book fully, and has wholly missed the point about the extremely high probability of split-genes in random primordial sequences. As I have already explained, the probability of fully formed split-genes in random primordial DNA sequences is very high, no matter whether a gene contains only a few exons or as many as 100 or more exons.''

Keith: Rather than complaining about my reading habits, why don't you answer my point? You only got "to be or not to be" out of the random scramble because that was what you were looking for -- you ignored all the legitimate English words (some longer than your targets) in order to get what you wanted. This is particularly ironic (to be polite), considering you rake Richard Dawkins over the coals for doing exactly the same thing in The Blind Watchmaker.

(reprise) "Contrary to his belief that I am ignorant of the ORFs and the exons being N, N+1 and N+2 nucleotide long where N is divisible by 3, I have extensively analyzed the ORF phases and have developed algorithms for finding exons of a gene."

For those readers not in the genomics field, Senapathy did publish some important analyses on splice junctions. However, it is very important to note, that it is still NOT possible to perfectly identify splice junctions in genomic sequence, and the best performing exon-identification software use heuristics which probably have little to do with how splice sites are really chosen. In other words, the statistics of Senapathy (and those of the workers before and after him) are still quite limited in their predictive value.

(reprise) "Keith has missed completely my explanation as to how fully-formed genes can occur within totally random primordial sequences. ... it is completely valid to look for a particular gene's occurrence within the finite random sequence. The example taken in the book is one of such any genes. This is the reason for looking for the given words of the given gene."

NO NO NO NO NO!!! You are showing you can reverse a process, but not demonstrating that the forward route will go! Taking a protein sequence and finding it in the sequence is fine if you already know that protein exists. But under your theory, this is how the proteins were found in the first place. For that to happen, you must be able to get the protein unambiguously out of random DNA sequence by following a set of rules. And your rules fail at that task.


I love my Mac [top] -- [home]