Regulatory, junk, and non-coding DNA are all partly overlapping categories, which helps foster confusion. But the typical bit of junk doesn't do anything positive for the animals that carry it. Individually, these don't cause an appreciable cost in terms of fitness, so species aren't under a strong selective pressure to get rid of it, and pieces could linger in the genome for millions of years. Genomic accidents-duplicating genes, picking up a virus-happen at a steady rate. These findings seemed to support a model that was first proposed back in the 1970s, which picked up the (possibly unfortunate) moniker junk DNA. And if you looked at the DNA of different mammals, the vast majority of it (about 95 percent) wasn't shared by different species. One fish, the fugu, lacks a lot of this DNA, and seems to get along fine, while many salamanders have ten times the DNA per cell that humans do. And all of it seemed to be an evolutionary accident. Introns accounted for another large fraction. Over half the genome was built from the remains of viruses and transposons. And I knew all this as an undergrad in the late 1980s).īy the time we sequenced the human genome, we discovered that this seemingly useless stuff was the majority. But the copy was only expressed in males because a mobile genetic element's regulatory sequences had been inserted nearby. (To give you an idea of how mainstream all this was, I spent some time working on a mouse gene that was thought to be superfluous because it was a near-exact copy of a gene used by the immune system. Many of these apparently useless pieces of DNA continued to carry sites for regulatory DNA binding proteins and continued to make RNA. Even some of the coding portions seemed a bit useless-near exact duplicates of genes were common, as were mutated and disabled copies. Vertebrate genomes also appeared to be littered with old and disabled viruses and mobile genetic parasites called transposons. Some of these are huge-roughly a third the size of some of the smaller bacterial genomes. The coding portions of vertebrate genes turned out to be interrupted by noncoding regions, called introns. But that isn't generally true of vertebrates.
The typical bacterial genome is over 85 percent protein-coding DNA, leaving just a small fraction for regulatory purposes. The Lac operon is present in bacterial genomes, which are under extreme pressure to carry as little DNA as possible. We've had some indication that non-coding DNA played key regulatory roles since the 1960s, when the Lac operon was described and won its discoverers the Nobel Prize. Second, it has control sequences that don't encode anything, but determine when and where the coding sequences are active. First, it codes for the proteins that perform most of a cell's functions. What we know about DNA, and when we knew itĪmong other things, DNA has at least two key functions. To understand why, we'll need a bit of biology and a bit of history before we can turn back to the latest results and the public response to them. They were egged on by the journals and university press offices that promoted the work-and, in some cases, the scientists themselves. But you can't entirely blame the press in this case. As a result, the public that relied on those press reports now has a completely mistaken view of our current state of knowledge (this happens to be the exact opposite of what journalism is intended to accomplish). Many press reports that resulted painted an entirely fictitious history of biology's past, along with a misleading picture of its present. This was more than a matter of semantics. Yet the third sentence of the lead ENCODE paper contains an eye-catching figure that ended up being reported widely: "These data enabled us to assign biochemical functions for 80 percent of the genome." Unfortunately, the significance of that statement hinged on a much less widely reported item: the definition of "biochemical function" used by the authors. What the studies can't generally do, however, is figure out the biological consequences of these activities, which will require additional work. ENCODE is a large consortium of labs dedicated to helping sort that out by identifying everything they can about the genome: what proteins stick to it and where, which pieces interact, what bases pick up chemical modifications, and so on. Although we've had the sequence of bases that comprise the genome for over a decade, there were still many questions about what a lot of those bases do when inside a cell. This week, the ENCODE project released the results of its latest attempt to catalog all the activities associated with the human genome.