Massive project reveals shortcomings of modern genome analysis
The sequencing and comparison of 12 fruit fly genomes -- the result of a massive collaboration of hundreds of scientists from more than 100 institutions in 16 countries -- has thrust forward researchers' understanding of fruit flies, a popular animal model in science. But even human genome biologists may want to take note: The project also has revealed considerable flaws in the way they identify genes.
"We've made huge progress in recent years with many genomes, including humans, but a lot of the problems can't be solved by simply dumping data into a computer and having truth and light come out the other end," said Indiana University Bloomington biologist Thomas Kaufman, who co-led the project. "One of the things we've learned from this project is that when you compare a lot of different but related genomes, you are more likely to see the genes that are buried in all that A-C-T-G mush."
Two papers in this week's Nature separately report the results of the four-year genome project and use the data to draw some conclusions about the fruit fly genus Drosophila, particularly its star species, the human nuisance Drosophila melanogaster. Among the papers' conclusions is the idea that resolving any individual species' genome is greatly enhanced when related genomes are compared to it. The project was primarily funded by the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health.
More than 40 "companion" manuscripts are being published or are in press, each of which examines a different aspect of the data produced by the Drosophila 12 Genomes Consortium.
"This remarkable scientific achievement underscores the value of sequencing and comparing many closely-related species, especially those with great potential to enhance our understanding of fundamental biological processes," said Francis S. Collins, director of NHGRI. "Thanks to the consortium's hard work, scientists around the world now have a rich new source of genomic data that can be mined in many different ways and applied to other important model systems as well as humans."
The consortium purposely chose a wide variety of fruit flies for study, guessing correctly that both gene similarities and differences among the 12 species would be easier to identify. Some of the Drosophila species the scientists studied are closely related to D. melanogaster, some not. Some of the flies fulfill very specialized ecological niches, such as D. sechellia, which has evolved a unique ability to detoxify the fruit of the Seychelles' noni tree. The other 10 species the consortium examined were D. pseudoobscura, D. simulans, D. yakuba, D. erecta, D. ananassae, D. persimilis, D. willistoni, D. virilis, D. grimshawi, and the cactus-loving D. mojavensis. D. melanogaster's genome was published in 2000 and D. pseudoobscura's genome was published in 2005. The other genomes are newly published.
In comparing the 12 genomes, the scientists found 1,193 new protein-coding genes and hundreds of new functional elements, including regulatory sequences that determine how quickly genes are expressed, and genes that encode functional RNAs such as small nuclear RNAs. They also learned certain genes appear to be evolving faster than others, such as the genes associated with smell and taste, sex and reproduction, and defenses against pathogens.
The Drosophila 12 Genomes Consortium found that D. melanogaster shares about 77 percent of its genes with the other 11 species they studied. The scientists also found errors in about 3 percent of previously sequenced D. melanogaster protein-coding genes, correcting 414 gene sequences on record.
A vexing problem for genomicists is finding genes and other important DNA sequences in heterochromatin, tightly packed areas of chromosomes presumed to experience little expression. Heterochromatin is common in animal genomes.
"The heterochromatin is very hard to analyze," Kaufman said. "Studies show heterochromatin changes the most. It's full of intermediate- and full-repeat sequences. And there are genes buried in this stuff."