Senapathy SBE Reply Number 3
(Part I)


    Part I:

  1. Why a new theory?
  2. The occurrence of long DNA molecules in the primordial pond.
  3. Random DNA sequences available in a primordial pond would have inevitably contained millions of complete split-genes (genes of multicellular animals and plants) in them.
  4. Splice-junctions in eukaryotic genes are perfectly explained by the new theory.
  5. Complexity first and simplicity next: Seemingly complex eukaryotic genes and cells are far more probable than the apparently simpler prokaryotic genes and cells.
  6. From split-genes (primordial DNA sequences) directly to genomes for complex organisms: Origin of many kinds of similarities directly from the primordial pond, not by organismal evolution.

    Part II:

  7. Darwin's domain: Arbitrary assumptions and self-imposed limitations.
  8. Fossil record too... truly supports the new theory.
  9. Misleading mutations: Mutations are passive genetic changes, and are incapable of evolving new genes, new body parts or new organisms.
  10. It is the egg that was the first, not the chicken: The validity of the concept of the seed-cells.
  11. If Darwin was alive today......
  12. History of Darwin's theory reveals how it was shoehorned into what it is today...
  13. A castle without foundation: Total lack of an explanation for the origin of the first, ancestral cell -- the foundation from which evolution theory begins.

    Part III:

  14. Negating the evolution theory is unnecessary, since the new theory explains the scenario of life on earth in a better manner. --- OR --- Why don't we discuss the details of the new theory, that explains the scenario of life on earth in a better manner, instead of being hung up with the same old questions of the evolution theory?
  15. Prebiotic richness and new chemical evolution experiments.
  16. Survival of the mammalian baby.
  17. The simplest possible living cell on earth is not really simple.
  18. Shared differences in protein sequences.
  19. Evolution theory or the independent birth theory is falsifiable: Empirical tests for the independent birth theory or the evolution theory.
  20. Conclusion.

From Periannan Senapathy
Date: Sepember 12, 1995

Part I

I should apologize for being inactive in responding to the internet posts regarding my theory for the past several weeks. I want to do justice to that, and it leads to my replies being long. Hope you will bear with me. I would like to thank those who have participated in the very interesting discussions, both positive and negative to my theory. I am happy that Jeff Mattox has read my book thoroughly and has completely understood the theory. Since he came into the discussion, he has answered most of the questions correctly and effectively. I also want to thank the many who sent personal emails and letters supporting my theory.


1. Why a new theory?

First, let me answer a question: Why do we discuss my theory of the independent birth of organisms in the evolution forum, when we know that those who believe in evolution will be vehemently opposed to it? It is only natural that when people, who already believe that evolution is an established fact, are told that it is not so and that there is another theory that can better explain the scenario of life on earth, they will be emotionally opposed to it and will even be angry. But it is not to anger them that I began discussing my theory here, but to exchange with them the scientific details and to enable them to see the validity in the basic tenets of the new theory. Many here are staunch supporters of evolution theory either due to educational training or due to self-conviction. Yet if one can set aside an emotional attachment to evolution theory, and give a few moments of unbiased thinking to the possibility of the new theory, I am confident that one may see the reasonableness in the new theory. When a theory is not ultimately proven, no matter how much it appears to be proven and no matter how much it appears to be convincing, there can be another theory that can better explain the scenario. The purpose in discussing my theory here is to show that this new theory, which I certainly do not claim as a proven fact, is able to explain the scenario of life on earth better than the evolution theory, and at least as well as the evolution theory. This will at least make people not go in the wrong direction any longer, and to take a fresh look at the whole scenario and to find new answers and explanations.

My theory is like evolution theory in its scientific attitude and basic approaches, except it explains the origins of genes and genomes and the connections and especially the distinctions among organisms in a much better manner. Many here are frustrated as to why do we need a new theory, when the existing evolution theory seems to be adequate to explain the scenario. The existing evolution theory is no where near being adequate to solve many of the problems posed in understanding the origins and in explaining the connectivities and unconnectivities among organisms. We certainly need to be able to find answers to many of the unanswered questions regarding the origins, and to be able to explain may of the unexplained scenarios of life on earth. That is precisely what my theory attempts to do. So please plunge in, and even be emotionally angry, but please do give some unbiased moments of thought and analyses to the scientific details that we discuss here. Perhaps, some of you will become convinced at least to some extent after all this that the new theory is possible and plausible after all!

There have been many newcomers to the discussion on this forum. Perhaps it is worthwhile to also include a historical aspect of my theory, that is, how, being a molecular biologist interested and convinced about evolution theory, I happened to formulate a new theory that is basically opposed to the evolution theory and which is scientifically in no way inferior to the evolution theory. And why I claim that it is in fact scientifically able to explain the scenario of life on earth better than the evolution theory. While I try to answer the questions, the historical perspective may perhaps be helpful to remind the reader why such a new theory is needed in the first place, and what are its nuances.


2. The occurrence of long DNA molecules in the primordial pond.

I am a molecular biologist by training, and a scientist by interest. As I have said before, I have been interested in the question of the origin of life since my college days, as many of us have. My interest has been purely out of an instinctive want into scientific inquiry, and I have no vested interests, religious or otherwise.

As I was researching into the molecular details of the origin of life on earth and pondering over Darwin's theory in order to find an explanation for the origin of the first primitive cell on earth, I found that the existing theories and details really did not have an answer. Just out of my graduate studies and doing post-doc research at the NIH, I found that the theory and works in the field of chemical evolution were good, but were falling short of explaining the origin of the first genes, genome and life of even the most primitive, free living cell possible on earth. These theories were in the right direction, but were far less sufficient to explain the origin of even one complete gene. They proposed and tried to explain things piecemeal, that were far less complete to address the question of the origin of the first free living cell. So I was pondering and researching to find an answer to these questions in order to understand how the genes of the first cell could have been formed, how the genetic code and the genetic machineries could have evolved, if these mechanisms evolved before the first cell was formed or after it was formed, and many other such questions. This was right around the early 1980s, just about when the split structure of the eukaryotic genes had been discovered, and the origin of which structure was a puzzle for all of us working in molecular biology.

While researching and assimilating the details in chemical evolution research, I took a molecular biology approach to understand how the DNA, the genes in the DNA, proteins, genetic code, genetic machineries and the cell itself could have originated in the first place. It was the field of chemical evolution that had taught me to think that all the reactions among the earth's elements and chemicals must have been random in nature, and from which the right combinations could be selected. But it also said that things evolved gradually from simple to complex, from simple biochemicals gradually to complex biochemicals and to the first cell which was most primitive and crude, from which organismal descent with modification, that is Darwinian evolution, took over. The sequence of events that were proposed and the explanations given for the formation, or even if we may call it evolution, of the very first cell were very vague and piecemeal, and very incomplete. There was really a big gap between the proposals, details, and explanations, and the reality of even the most primitive free living cell on earth as we know it. First, based on chemical evolution experiments, people had been given to think that only short oligonucleotides could be formed in the primordial soup, which then had to combine and somehow form genes by means of chemical evolution. It was also vague if the genes were formed fully before the first cell was ever formed, or after that within the cell. And in neither case was there a systematic analysis with regard to the probability, statistics, structure or function of the genes or proteins. The explanations were story-telling type, and were in no way scientifically rigorous. The explanation was that during this process, somehow the genetic code should have been formed, somehow the genetic machineries should have been formed, and somehow the proteins could have been formed, and somehow a primitive cell was formed. While we cannot blame any one for the lack of knowledge then, this was purely vague and was not scientifically satisfying. We needed more concrete science.

While trying to understand these things and researching with random sequences, I asked what if long DNA molecules were formed in the primordial soup. They could be made by many means: short ones could be made by chemical means, these could then be recombined by chemical catalysts -- proteinacious or otherwise (e.g., proteinoids or random peptide mixtures that were not gene-coded) -- to which again random nucleotides could be added. In any case, I could see that there was no reason why long DNA strands could not be formed in the primordial soup. In all the prior proposals concerning chemical evolution, what I saw were arbitrary assumptions and self-imposed limitations in the thinking of the people who proposed them. The original authors such as A. I. Oparin who proposed chemical evolution were doing it with an aim of explaining the primordial chemistry for the supposedly very difficult origin of the first cell that was supposed to be very primitive. Let us not forget that such authors were fully influenced by Darwin's theory, which started its arguments beginning from such an apriorily assumed, simple, primitive, single-cellular life on earth.

During my graduate studies I had worked in a DNA chemistry laboratory, and my professor, T.M. Jacob, had worked with H. G. Khorana (whose group accomplished the first chemical synthesis of a complete gene and who got Nobel prize for it) for a number of years. My first works were chemical synthesis of oligonucleotides (well before the automated synthesizers had come along). As a consequence, questions and concepts that constrained the length of the DNA in the primordial soup in the existing field of chemical evolution did not bother me. For me, DNA could be a strong molecule unless there were enzymes to degrade them, and unless it was placed in a hostile environment, and could be formed in reasonable lengths. Wouldn't any molecule be degraded under conditions conducive for its degradation? So is DNA. But, there are many sets of reasonable conditions under which DNA is very stable. Once we have random oligonucleotides that are tens or hundreds of characters long, then there was no reason why they should not be linked to form strands thousands or hundreds of thousands, or even millions of nucleotides. Except for getting emotionally angry for saying something that is not traditionally accepted, no one can provide a valid scientific reason as to why this could not have happened.

There are millions of species living today, and many millions of individuals for every species. The DNA in each of the chromosomes in each cell is at least tens and up to several hundreds of millions of nucleotides long -- each a very long contiguous molecule indeed. These DNA molecules in each of the trillions of cells in each individual of every organism are perfectly stable. Of course, the reason is that there are other biochemicals including small molecules and proteins protecting them. The essence is: Trillions of DNA molecules, each hundreds of millions of nucleotides long, are perfectly strong and stable, just because some molecules are bound to them and protect them in a conducive environment. Not only that, they perform multitudes of coding functions in each cell making almost no mistakes! In the test tube also, the cloned DNAs, tens of thousands of nucleotides long, is stable for years in just a buffer solution. (If any one has a doubt, I have worked extensively in a cloning laboratory where I had left cloned genes in solution at room temperature for several months, which was fully intact after that.) While these are plain facts, why should we not think that some similar kind of DNA protection could have existed in the primordial pond at least to some extent? If some molecules can protect DNA today to this tremendous extent, why should we not envisage that long DNA molecules could have been protected in the primordial pond -- especially if this concept could lead to a clear understanding of the origin of split-genes and the origin of organisms by a new mechanism? So, let us please not restrict ourselves by the constraining forces of traditional thinking which were purely based on assumptions in the first place, and proceed with the possibility of long DNA molecules in a primordial pond.


3. Random DNA sequences available in a primordial pond would have inevitably contained millions of complete split-genes (genes of multicellular animals and plants) in them.

With this basis, I analyzed the formation of the genes that could code for complete proteins. I did this by using the computer, not the test tube. My primary question was: If the DNA sequences were long and were random in sequence, then how did the genes capable of coding for proteins form? I analyzed the lengths of the proteins of prokaryotes and those of eukaryotes, the lengths of the genes of the prokaryotes and the eukaryotes, the distribution of reading-frames in random DNA sequences that I simulated in the computer and in the DNA sequences of the prokaryotes and the eukaryotes, and the mix and match of all these things in the computer. I used the PIR (Protein Information Resource) from the National Biomedical Research Foundation extensively. GenBank was just being begun then. I wrote many computer programs and also got the help of computer scientists at the Division of Computer Research and Technology at the NIH. I am saying these things only to show that I have not simply looked at the sequences and have said what I have said, and that I have done very extensive computational analysis, both by performing simulations and by comparing the DNA, gene, and protein sequences of actual organisms both prokaryotes and eukaryotes, and those from simulated random DNA sequences. I must also note that such an analysis has never been carried out simply because no one ever has even looked at the possibility of long DNA sequences having fully-formed genes in them, in the manner I have done now. In fact, as we all know now, evolutionary biologists are totally opposed even to the idea of the existence of long random DNA molecules in the primordial pond, which prohibited, disallowed, inhibited, and in a sense proscribed any one to even taking the route that I have taken now.

In my analysis I found out that given long random DNA sequences, genes capable of coding for complete proteins could indeed simply occur fully-formed not as contiguous genes, but as split-genes, in fact, with quite typical structures that are found in all the eukaryotic genes. I simply asked the question: If DNA sequences were random, then could coding sequences occur contiguously in lengths capable of coding for full-length proteins? I tested the sequences for the distribution of reading-frame lengths in all the three reading frames in random sequences, and found that the reading frames were constrained to an upper length limit of about 600 nucleotides. This I found to be true even if I simulated DNA sequences of length millions of nucleotides. I also found that the reading frame lengths were distributed in a negative exponential manner, which meant that the shortest reading frames were the most frequent and the longer and longer ones became rarer very rapidly, in an exponential manner, and they became almost non-existent after about 600 nucleotides. For every order of magnitude increase in the length of the random DNA sequence that I simulated in the computer, there was an increase of only about 10 nucleotides in the upper length limit. That is, reading-frame lengths up to about 600 nucleotides would be present in a random DNA sequence of about a million characters, up to about 610 would be present in a random sequence of about ten million characters, 620 in 100 million characters, 630 in one billion characters, .... 660 in a random DNA sequence of one trillion characters, 690 in a random DNA sequence of one quadrillion characters, and so on. These are only approximate and statistical and are intended to illustrate the concepts. This means that even if we have very long DNA sequences, long contiguous coding sequences that could code for full length proteins could not simply occur in them. Therefore, the long, contiguous genes of prokaryotes that code for long proteins (whose reading-frames go up to many thousands of nucleotides in length) could not have simply occurred even in very long random DNA sequences. However, as I said before, just around 1980 the split structure of the genes of the eukaryotic organisms had been discovered. So, I could correlate this knowledge with the structure of genes that were possible from random DNA sequences. It then became clear that if coding sequences could occur in pieces within the "length-constrained" reading frames of a long random DNA sequence, then they could be chosen in a consecutive manner piecewise, and then could be successively combined together to form a long, contiguous coding sequence.

Once I figured out this possibility, then I did a number of tests and analyses to confirm this concept. In fact, when I tested the reading-frame lengths of actual eukaryotic genes, they looked exactly as predicted from the random sequences. Furthermore, when I tested the lengths of exons, they were all under the upper length limit of the reading-frames. I even graphically tested all the reading frames, and the exons within them. This prediction became verified. (Please note that this is statistically very true. There are some exons which are much longer, but these can be derived by the loss of some introns. This should answer a question that Keith Robison raised about the presence of longer exons.) The idea is that all the exons should be statistically under a length limit of about 600 nucleotides. They would have a tendency to be chosen from the longer reading-frames, which are still within the constrained length-limits, rather than the shorter ones. It means that the exons will be under the upper length limit, but would be more frequently the longer ones. When we normalize this kind of distribution for the frequencies of the lengths of reading frames, then we would see that the exons will start with a lower length limit of about a few characters and will peak around 100-200 characters. This is because the longer reading frames even within the 600 character upper limit are very rare. I have stated that the exons will be chosen within the available long reading-frames, and, as also stated in a commentary on this work in New Scientist then, will be chosen from the best of the coding-pieces from among these length-constrained reading-frames. While the distribution of the reading-frame lengths is negative exponential, the distribution of the exon lengths will not be negative exponential, but will have a normal distribution under the negative exponential curve. I have described this several times before. This is what has been noted in a commentary by Stoltzfus et al in a recent issue of Science magazine. Stoltzfus has not said anything new, or anything that would contradict my concepts or data that I have provided before. He has simply shown graphically what I have said many times in descriptive English. It does not contradict me as Keith Robison has incorrectly said in one of his recent SBE posts.

When these fundamental things became clear, I also analyzed the amount of random DNA needed which could contain the genes with a probability that is close to being one. This amount, ~10^26 nucleotides, is in fact very small in terms of physical quantity. Consider that an individual creature of the size of a dog or human contains about 50 grams of DNA, and about 10^23-10^24 nucleotides. This would clarify that the amount of random DNA required in a primordial pond for the set of all genes to occur in it is not very much indeed. I then conducted extensive computer simulation experiments, wherein I simulated random DNA sequences in which I searched for specific genes. I could not simulate the 10^26 characters since we cannot do this in a reasonable time-frame in today's computers. But, I simulated random DNA totalling to many billion characters on a SUN workstation, and searched for genes that were shorter, but using the principles of long split-genes. In fact, I used portions of actual genes. The results proved the concept that I started with, that far less length of random DNA sequence was needed when the features of eukaryotic genes: namely, split-structure, codon degeneracy in genes and amino acid degeneracy in proteins, were used at precisely the expected extents than that traditionally believed for the same length of contiguous genes. No contiguous genes as those in prokaryotes could be obtained by such searches even in a thousand times longer DNA sequence. Another important thing is that no matter what protein sequence is searched for, the gene coding for that protein will occur within exactly the same random sequence. This answers the question that many posters have asked, including Keith Robison. I am not searching for one specific gene sequence. I can search for any specific gene sequence in the same random sequence, and yet we will obtain that particular gene sequence somewhere within the same random sequence. This is the power of this approach that shows that almost any gene coding for any protein specifying any biochemical function will occur in the same random sequence of 10^26 characters. In fact, one more interesting thing about this concept is that any new random sequence, as long as its total length is ~10^26 nucleotides, will contain the set of almost all genes.

Another interesting thing about the random DNA available in a primordial pond is that it can contain many distinct genes that can code for essentially the same protein. Short sequences such as the homeobox sequences or enhancer sequences can occur millions of times independently within this amount of primordial DNA, as they are fairly short and exhibit sequence variation. Many genes coding for multifunctional proteins can exist in independent random sequences with many of their parts being similar purely by chance. All these are consistent with what we see today in the genomes of multicellular organisms.

In one of his SBE posts, Keith Robison has asked if the gene for a cytochrome C protein could occur within the 10^30 nucleotides that could be available in a pond. Taking the whole cytochrome C protein sequence, and the contiguous DNA sequence needed to code for the complete protein, he expects that the probability of finding it is 10^-112, meaning that it would take a random DNA sequence of ~10^112 nucleotides for the gene to occur in it once. This approach is exactly what I say is totally wrong. This kind of assumption is what has been making the evolutionists to go in exactly the wrong direction -- in the direction exactly opposite to where the truth exists. One of the major purposes in my book is to show that split genes are tremendously far more probable in a primordial pond than has been assumed for any gene by such people. I have posted many of the details in SBE before, so I will not go into the details. The details regarding how genes could exist in a small primordial pond in abundance are described in a full chapter ("The Abundant Occurrence of Genes in the Primordial Pond") in my book. Also, as Jeff Mattox responded to Keith's question recently, it is the exons that should be taken into account and not the whole gene. In addition, codon degeneracy and amino acid degeneracy has to be taken into account. When we do these, the probability of a complete gene, no matter how long it is and no matter what sequence it codes for -- as long as the longest exon is around 600 nucleotides -- is close to being 1. Again let people like Keith not get emotional and say that they know of eukaryotic genes that have exons longer than 600 nucleotide. I have said many times that we are dealing with statistically observable details here on the one hand, and on the other hand, the longer exons can be easily be derived by partial gene-processing (that is, by losing one or more introns through the messenger RNA-reverse transcriptase processing), which also I have explained in my previous publications.

Thus by the split-gene method I have delineated, not only a gene for cytochrome C, but almost any gene coding for any protein sequence can occur in its full form within the 10^26 nucleotides of primordial random DNA. I've provided enough simulation experiments in my book that would demonstrate this. Such potential of the primordial pond is truly amazing, but yet it is an absolute reality. And it is important for us to understand this potential, for it inevitably enables the multiple origins of genomes and organisms. It is not enough for people to simply say that they cannot believe what I have demonstrated. I have conducted these extensive simulation experiments and they have not! I cannot provide the details of these simulations and graphs that take dozens of pages here, but my book is open to any one who would like to know of the details.

[NOTE: For a demonstration of these concepts, inlcuding how easily complete split genes can be found in a random sequence of DNA, try out the interactive Exon/Gene Search Engine. JM]


4. Splice-junctions in eukaryotic genes are perfectly explained by the new theory.

Under my theory on the origin of split-genes, the length of exons are constrained due to the random distribution of stop-codons in a random DNA sequence. The stop-codons present at the ends of a reading-frame will occur at the exon-intron junctions in such a manner that they can be "spliced-out" along with the introns, so that the exons spliced-together will have a contiguous reading-frame. When we analyze the actual genes, it is extremely interesting to note this very presence of stop-codons at the ends of exons at exactly the place where it is predicted in almost all the genes in today's living organisms. In fact, they are parts of what are called splice-junction sequences that appear at every exon-intron and intron-exon junction, which led me to propose that these splice-junction sequences originated from stop-codons, and primarily due to the reasons of avoiding the reading-frame length constraint.

In understanding the origin of split genes, some people like Keith Robison and Arlin Stoltzfus have raised some questions regarding the phase of the stop codons in the splice-junction sequences. While Stoltzfus et al have stated that my concept about the origin of splice-junction sequences from the stop-codons is attractive, they say that the problem of "reading-phase" had not been addressed. In fact, I have addressed this in my publications, which Arlin Stoltzfus or Keith Robison do not seem to have read. The main idea is that initially when the genes were chosen from random primordial DNA sequences, the stop codons in the genes would have been in one reading frame. Once the mechanism had been chosen in the primordial pond, then the sequence of the splice-junction per se takes over. It means that, from then on, it did not matter if the stop codons within the splice junctions were in phase with the first exon or not, as long as the spliced exons produced an uninterrupted contiguous coding-sequence. Thus, the stop codons within the splice junctions in the genes that were chosen later will not all be in the first reading-frame. The only explanation for the origin of splice junctions in split-genes is the one that I have provided, and no other theory even comes any close to explaining the origin of splice junctions. My theory gives a reason for the reading frames of eukaryotic genes being statistically shorter than about 600 nucleotides, for the exons being statistically shorter than about the same upper length limit of 600 nucleotides, the splice junctions containing the stop codons precisely where they are expected, absence of stop-codons in genes other than those that code for proteins (such as tRNA and rRNA coding genes), and so on. None of the other theories on the origin of split-genes, either introns-early or intron-late, can explain any of these features to even the slightest extent.

An aside point here. One may wonder how a complex machinery as the spliceosome could originate in the primordial pond prebiotically. The question is whether a primitive system evolved into this high complexity or the right system was chosen from thousands of random kinds of molecular machineries which was then fine-tuned. My answer to this question would be the later. Out of may kinds of machineries, the ones that would have meaning for a living cell would have been selected. Of course, there would have been a considerable amount of fine-tuning through molecular evolution of the basic system further prebiotically and within the cell. There may have been systems that spliced together only introns, or some other features of DNA sequences, that were not useful to the life of a cell and which were not chosen. I have dealt with this question in the book with considerable detail.


5. Complexity first and simplicity next: Seemingly complex eukaryotic genes and cells are far more probable than the apparently simpler prokaryotic genes and cells.

The above findings are very important not just for the understanding the structures of genes of the eukaryotes, but for our understanding of the whole scenario of life on earth. These concepts and results showed that:

  1. The eukaryotic genes could occur fully-formed within a fairly small and reasonable amount of DNA with random sequences in a primordial pond.

  2. The prokaryotic genes could not have occurred fully formed in DNA with random sequences even in an amount that was trillions of times more than that needed for the occurrence of fully-formed eukaryotic genes.

  3. The only way that the prokaryotic genes could have originated on earth was by loosing the introns from the freely occurring eukaryotic genes.

  4. What does all this say? That the prokaryotic cells were not the first cells as has been assumed for long in traditional evolutionary biology. It had to be the eukaryotic cells that were the first living organisms that originated from a primordial pond.

  5. It might be counter-intuitive to many people when I say that the complex eukaryotic cells were the first, and not the simple looking prokaryotic cells. But, as I have shown in my previously published papers, and in my book Independent Birth of Organisms, even the complex nucleus should have originated in these first eukaryotic cells. The reason I say this is the following. The random sequence that contains a gene is typically long to the extent of many thousands and up to a million nucleotides and contain many exons -- a characteristic very similar to genes in living organisms. If the typical gene is transcribed into an RNA and if the ribosomes start to translate these RNAs, then what would result? The first exon will be translated, after which the protein translation will be terminated due to the presence of many terminating stop codons at the end of the first exon in the RNA. Therefore, the introns should be spliced out before the messenger RNA is even presented to the ribosomes in a living cell. Also, in a living cell, many such genes are simultaneously in the process of transcribing and translating into their corresponding proteins. This means that, if splicing occurred within a cell that had no nucleus, there would result a lot more truncated proteins that are a tremendous waste and burden on the cell than the number of good complete proteins. What then was the solution? Simply compartmentalize the RNA sequences that are transcribed from the DNA and the splicing process within a nuclear boundary, and present only the spliced RNA to the ribosomes which are segregated outside the nucleus in the cytoplasm. Now, the DNA (all the genes), the primary RNA transcribed from the genes, and the splicing process, are all packaged within the nucleus, and the ribosomes are outside the nucleus, thereby avoiding this problem altogether. Most of the proteins are in fact used within the cytoplasm, and if they are required within the nucleus, they will then travel into it from the cytoplasm.

  6. What does this show again? That complexity originated straight away in the primordial pond with life itself. Eukaryotic genes which appear to be complex with all its coding and intervening sequences and their splice junctions etc., originated directly in the primordial pond. And, eukaryotic cells which are more complex than the prokaryotic cells, with its nucleus could originate directly in the primordial pond. In fact, all this showed that although the prokaryotic cells are morphologically far simpler than the eukaryotic cells, they are genetically far far more complex than the eukaryotic genes and cells. Their genes could not simply occur fully-formed even in trillions of times more random DNA material than that is required for the occurrence of complete split-genes of eukaryotes. They could be formed only from the first occurring split-genes by losing introns. Loss of introns can easily occur by means of reverse-transcribing a spliced RNA (the messenger RNA) back into DNA by an enzyme called reverse-transcriptase. The genomes of the prokaryotes could be formed by the combinations of such genes that had lost the introns, and their cells could be subsequently formed from these genomes. Thus one of the essence of these findings is that apparent complexity originated straightaway in the primordial pond. I agree that it is highly counterintuitive to us to the extent that one could become emotionally angry, but that is the truth. There are many evidences that could attest to this. For instance, the microtubules (special proteins that are necessary for the mitosis in eukaryotic cells) are totally absent in the prokaryotes. There is no way that prokaryotes could evolve these totally new proteins to form eukaryotes and the system of mitosis (the process of chromosomal separations in dividing eukaryotic cells), which is an extremely complex system. Furthermore, the whole system of spliceosomes, the machinery that splices the introns out of the primary RNA in the eukaryotes, are totally absent in the prokaryotes. This machinery is made up of many RNA molecules and many different protein molecules. There has been no way that these evolved from prokaryote to the eukaryote. These genes simply occurred in the random primordial DNA sequences and were chosen in making the eukaryotes, which were then lost while forming the prokaryotic genomes. We can see that it is the reverse sequence of reality that the evolutionary biology has been looking at so far.

    Some SBE posters here have argued that whether introns-early or-late is immaterial to our discussions concerning the origin of life and organisms. To them, I would like to say that it matters the most. Our understanding of the origin of the split structure of eukaryotic genes is most fundamental to our learning about the origin of life and organisms. Among other things, it has shown us how the eukaryotic genes can directly arise (in fact simply occur fully-formed) from random primordial DNA sequences, how the eukaryotic cell can originate directly from the primordial pond, and how the genome of a multicellular creature can arise directly into its seed-cell and develop into the organism. Thus, it avoids the necessity for the series of assumptions after assumptions that the bacterial genome and cell had evolved first, and then changed into the eukaryotic cell, and then into a few-celled, supposedly simple multicellular creature, and then into other, more complex multicellular creatures, each and all of which steps are simply improbable, and for none of which steps any scientific evidence or explanation exists.

  7. As I stated above, I also computed the amount of DNA in which fully-formed split-genes could occur purely by chance. I did this systematically by taking into account the codon degeneracy in genes, amino acid degeneracy in proteins, and the split architecture of the genes. Detailed simulation studies and computational analysis showed that DNA sequences totalling approximately 10^26 nucleotides would be sufficient to contain trillions of distinct genes in them. These sequences, as some people in the SBE discussion have incorrectly stated, need not be a single long strand. They could be present in small pieces of thousands to hundred thousands to millions of characters. It is the nature of statistics and probability that makes it possible that trillions of genes will exist in these random DNA sequence pieces. I cannot go into the details of my analysis, which has taken about 70 printed pages in my book. Anyone interested to know the details of computer simulations and other details is welcome to read them from the book.
However, I want to point out one thing. What we are uncovering here amounts to an entirely new world view: a realization that the finite quantity of random DNA sequences in a primordial pond contained an abundant number of complete genes. What is even more interesting is that even the genes, which code for proteins with internal sequence repetitions that make them appear to be highly evolved and nonrandom, could have occurred in the finite random primordial DNA sequences as easily as any junk sequence. Genes that direct the building of complex organs and body structures such as the eye, heart and brain could therefore have simply occurred, in their full-form, in the primordial pond. No matter whether a gene goes to make a complex organ such as the eye or the brain, or it is a gene for a simple metabolic reaction, it will occur with essentially equal probability in the random primordial DNA sequences. (Although all split-genes are equally probable from random DNA sequences in terms of statistics and probability, it is in the eyes of the evolutionary beholder that genes in general appear to be evolved, and that somes genes appear to be far more evolved than others, primarily due to their traditional beliefs of gene-evolution which begins from short oligonucleotides.) Because of the split-architecture of genes, and the codon degeneracy and amino acid degeneracy and a few other structural features that make it possible, almost any gene, however complex it may appear, however long it may be, however many exons it may contain, and whatever protein it may code for, will occur in its full-form with a probability of near 1, within the finite amount of random DNA material that could be reasonably available in a biochemically rich primordial pond. Only because of this phenomenon, life itself was possible on earth, even the simplest life, and which phenomenon automatically made possible the formation of complex life also with equal probability, not just one or a few but multitudes. It is this phenomenon of abundant gene occurrence in the random primordial DNA sequences that make it possible that complexity can originate first and with high probability. If this level of complexity did not originate and occur first on earth, then no simple life, even a bacterium, could ever have originated. This amazing phenomenon makes it truly unnecessary to be bogged down by the traditional feeling that simple living things should appear first on earth and only then complex things could be derived from them. Our concepts of complexity and simplicity have been truly reversed now! However many times I repeat and reiterate the importance of this phenomenon, I think that I would not have overstated it.


6. From split-genes (primordial DNA sequences) directly to genomes for complex organisms: Origin of many kinds of similarities directly from the primordial pond, not by organismal evolution.

I must say that the knowledge of the split structure of eukaryotic genes that was unraveled around 1980 was pivotal to me not only for asking questions about their origins, but also for finding answers to the larger questions concerning the origin of life and organisms. The answers to the questions concerning the origin of genes also provided answers to many of the questions on the origin of life and organisms. These answers are:

  1. If sufficient amount of random DNA material was present in a primordial pond, multitudes of split genes will occur fully-formed in it.

  2. Genomes for complex eukaryotic cells could be assembled from these genes directly in the primordial pond, leading to the formation of many distinct eukaryotic single-celled organisms.

  3. Genomes for complex multicellular organisms also could be assembled from the large gene-pool of the primordial pond. The genomes of the single-celled eukaryotes could be used in pieces and in full in the formation of the genomes of multicellular organisms.

  4. While many genomes for different multicellular organisms could be assembled from the same gene-pool, copies of many common genes will be included in the different genomes. Also, copies of unique genes that are not present in most other organisms will also be included. This will lead to the scenario where anatomically distinct organisms will be formed from the same common gene-pool, which could contain both essentially identical genes and some unique genes. There could also be functionally similar genes with part sequence similarity but which genes are not structurally related. This "mosaic" gene scenario is precisely what we observe in today's living world. However, when one views the scenario through the evolutionary eyes, then one would only see an evolutionary relationship among the various organisms, under the assumption that one single cell was formed first from inanimate matter, from which all other life sprang about -- using only the "similarity" part of the scenario and leaving out the "uniqueness" parts. If one constructs a "phylogenetic tree" based on an assumed evolution, one would certainly get a tree, in fact a number of trees depending upon the methods and assumptions.

  5. This scenario will also lead to basically similar biochemistries, cellular and tissue structures, and common metabolic pathways among the widely distinct organisms. While the distinct organisms are based on essentially the same biochemistries and cellular structures, their anatomical structures, functions, and many genes could be unique and unrelated. The same genes in the different organisms could be mutating independently in the different genomes during the life time of the genomes, that is, as long as the organism is viable until it becomes extinct. Again, when one views such a scenario of similar basic biochemistries, cellular and tissue structures, then one is mislead to think that evolution must have produced it.

  6. The viability of a primordial pond is limited to a finite period of time. While it is difficult to put an exact figure for the time-frame, a crude estimate could be thought of. It could be anywhere from just a few hundred years to many thousands of years, depending upon the conditions of the pond. The pond could be physically depleted of the DNA material, or the organisms formed, single-celled or multicellular, could devour the material. However, the organisms that were formed before any of these things happen, will have become already viable and will be reproducing their own kind. The problems that people have talked about here concerning this are possible, but are very subjective to the conditions in a pond. If a reasonable time could pass before such a thing could happen in a given pond, it is sufficient for millions of genomes to be formed into their seed-cells. There is absolutely no scientific reason why such conditions could not have existed, except if one wants to be hung up by such a constraining thought. In fact, I have shown that there seems to be many ponds in reality from each of which life had originated on earth. And in each pond, life was formed in multiple forms. It is possible that many other sets of life could not survive from their respective ponds due to such problems. So, we are talking about statistics and probabilities here, and we should not be hung up by such subjective arguments.

I love my Mac [top] -- [The new theory home page] -- [Part II] -- [Part III]