Q J Med 2003; 96: 175-176
© 2003 Association of Physicians
Biologic |
Failing dogma and a sloppy code
We have all been brought uppost Watson and Crickon the mantra one gene, one protein and have probably regarded antibodies as a special case, if we bothered about the exceptions to the rule. This starting point has led to singular interpretations of what we might do with a reading of the genetic codesome presentations to the public have led them to suppose that there will be a disease (and a cure) per faulty gene. All we need is a good reading system that will tell us where to look for a marker, and that in turn will tell us how to attack the disease process. But it really is not like that at all.
The advent of A user's guide to the human genome1 a clear guide around the system, does not help the casual molecular biologist/physician much in the attempt to exploit this rapidly accumulated knowledge clinicallydespite the clarity of, and the caveats expressed in, the text. There is a need for a considerable level of sophistication in understanding the relationships between DNA and disease processes before the information available can be exploited properly. For isolated steps in a metabolic process, for vital membrane channel functions, for regional organization in the embryo and, less definitively, for some syndromes, it is possible to link code-reading outcomes to process. Even if one adds in those diseases with widely disparate manifestations that result from single-mechanism-produced conformational changes in proteases, such as the serpinopathies,2 there are few cases in which a specific gene correction would eliminate great swathes of disease. It is simply not the case that there will be a necessary and direct equivalence between the production of any protein and a disease. A better understanding of the widespread existence of somatic mosaicism in Man (Bloom's syndrome, Fanconi anaemia, mitochondrial diseases, Turners syndrome and around 1/2000 phenotypically normal individuals) and of what might be called post-Mendelian thinking about inheritance should change things. The number of illnesses that can be explained by mutations at a single locus is diminishingthere are rethinks about phenylketonuria, cystic fibrosis was never clear-cut, many of the genetic deafnesses, Hirschsprung's disease, and Alzheimer's are all examples.
Why is there not a better direct read-across' from genetic data? Of course, many diseases and malformations are polygenic in origin, but more fundamental truths underlie the non-equivalence. Even if everything goes as it ought to in translation, there is a good deal of cheating by the end-products of the process. The enzyme phosphoglucose isomerase catalyses the interconversion of D-glucose-6-phosphate and D-fructose-6-phosphate, but is identical to the protein secreted by T cells that ensures the survival of some embryonic spinal neurons and sensory nerves (called neuroleukin in this manifestation). It is also the same as a factor involved in metastasis, and it appears to be the differentiation factor involved in controlling the differentiation of human myeloid leukaemia (HL-60) cells to terminal monocytes. In the same confusing way, a single gene produces a protein with the same amino acid sequence in the eye and liver; in the eye it appears as
-crystallin in the lens and in the liver as
-enolase. The same protein simply does a different job, and if it is defective in the differing functions, it will produce different outcomes.
This plurality helps to explain why we can do so much with so few genes and (partly) why there are not many more genes in what we regard as more complex animals. However, lack of specificity and correlation between disease and genetic failure also tells us something about how we have evolved. The genetic code is a cobbled-up system with an enormous failure rate and complex systems to cover up its deficiencies. The way it has developed makes an interesting story that helps to explain these failures.
The code has clearly not arisen by random association between codons and amino acids. Chemically similar long chain aliphatic amino acids have related codons (Leu, Ile and Val), as do those for charged basic (Arg and Lys) and acidic amino acids (Asp and Glu). Perhaps the capacity to recognize classes of amino acids preceded individual associations with specific codons. A progressive increase in selectivity is suggested by the fact that codon XYN (where X and Y are fixed bases, but N is any of the four) frequently codes for more than one amino acid (glycine, alanine, valine, proline and threonine are examples). It is thus possible that the code became more specific by the addition of a third base.3
The meaning of each codon is the same in all organisms; this is a strong argument that life has evolved only once. However, there are some differences in the triplet code in both mitochondria and in ciliated protozoansnot surprising if you believe the latter have been captured by the eukaryotes. In terms of translation, the binding of tRNA is more specific for the first two bases in a codon than the third (a single tRNA species can bind to both UUU and UUC codons), supporting the pattern of evolution described above. In general, the amount of a particular tRNA which exists in a cell will reflect the availability of the amino acid it handles, and where more than one form of tRNA can handle a given amino acid, the relative proportions of the two forms will vary. This offers the basis for one form of post-transcriptional regulation.
But this redundancy in the code is only part of what is singular about it. Why are only A, T, G, and C letters of interest? There are 16 possible nucleotides that could pair up to make DNA, and all sorts of phoney DNAs have been synthesized. MacDonaill thinks that they were chosen as part of an error reducing strategy just as credit card companies use what are called parity bits to reduce errors. A parity bit is added to the end of digital numbers to make the digits add up to an even number. So if you were sending 100110 you would add 1, 100001 would have 0 added and what would be sent is 100110,1 and 100001,0 respectively. As the most likely transmission errors are the substitution of a 1 for a 0, or a 0 for a 1 in the number proper, the error in transmission can thus be readily recognized. The clever idea of Donall MacDoniall was to represent each nucleotide as a four-digit binary number. The first three digits represent the three binding sites that each nucleotide presents to its partner. Each site is a hydrogen donor or an acceptor; a nucleotide offering donor-acceptor-acceptor sites could be represented as 100 and would bond only with an acceptor donor-donor, or 011. The fourth digit is 1 if the nuleotide is a single-ring pyrimidine and 0 if it is a double-ring purine. Nucleotides bond readily with those of the alternative type.
He suggests that the reason for choosing A T G and C is that they all add up to an even number; the fourth nucleotide acts as a parity bit. Mixed nucleotides would be very susceptible to error; for example C (100,1) binds naturally to G (011,0) but could accidentally bind to the odd parity nucleotide X (010,0) because there is a single mismatch (weakly compared with C-G, but effectively). But C is, in fact, very unlikely to bind to any other even parity nucleotides because there would be two mismatches. This possibility is avoided by excluding all odd-parity nucleotides from the DNA code.
This exciting idea emphasizes the fact that this is a process that has many chances to go wrong; anything that reduces error rate is likely to have had a selection advantage. There are a number of error reducing systems in DNA copying and all of them seem to be necessary, in view of the fact that failure almost invariably gives rise to a disease (xeroderma pigmentosum is perhaps the best known). What has been under-appreciated is that post-transcriptional and post-translational changes are not the only methodologies by which the function of a protein can be changed; its placement in the cell or in a cell type, may alter its roleassuming normal production.
There will never be a day where we will be able to go to a database, look up a gene associated with the symptoms and signs of a disease and reckon to correct the error by direct manipulation. More importantly, these data emphasize that the genome is not a blueprint for an animal, it is a set of facilitating instructions.
So why is there such excitement about cloning?
References
1. Wolfsberg TG, Wetterstrand KA, Guyer MS, Collins FS, Baxevanis AD. Nat Genet 2002; 32(Suppl.): 179.
2. Lomas DA, Carrell RW. Human genetics and disease: Serpinopathies and the conformational dementias. Nat Rev Genet 2002; 3:75968.[CrossRef][Web of Science][Medline]
3. Darnell JE, Doolittle WF. Speculations on the early course of evolution. Proc Natl Acad Sci USA 1986; 83:12715.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||