From the day-to-day questions we ask, experiments we do, papers we read, and what we think about in my lab, things often seem dull. The breakthroughs and insights made by myself and members of my lab offset this and do make the work exciting, inspiring, and fun. However, sometimes I read a paper that fascinates me often from the new information perspective. Occasionally I find a paper that I classify as fascinating from the cleverness of the scientists who did the study. These tend to be the paper I read and think "Why didn't I think of that!"
The November 30th issue of science had just such an article.
The paper by Sorek et al discusses the issue of horizontal gene transfer in bacteria. Horizontal gene transfer (HGT), the acquisition of new genes from unrelated sources, is widespread throughout the bacteria and is found within eukaryotes as well. HGT is behind the epidemic of antibiotic drug resistance in previously susceptible organisms. We have known for quite some time that genes that encode diverse functions could be transferred horizontally, but the limits of what could and could not be moved has not been definitively studied. Sorek et al set out to answer this question.
Here is the clever bit...The authors used currently available genomic sequence information to get a idea about what was going on. To understand this you need to understand how genomes are sequenced.
In short, you purify genomic DNA from the organism you are interested in studying. This tremendously long genomic DNA is broken into much smaller fragments containing on average 1-3 distinct genes, This fragmented DNA is cloned into a vector (a DNA backbone that can be stably moved into 1 or more organisms) and the resulting vector + fragmented DNA (plasmid) is recovered in the bacterium Escherichia coli. Any given E.coli cell will get one plasmid, so 10s -100s of thousands of E.coli cells need to be recovered to get enough individual fragmented DNA molecules to ensure you have the entire genome of organism of interest represented. These plasmids are then recovered and sequenced. In general terms researchers want to have 10x coverage, that is every piece of DNA is sequenced 10 times on average. Despite this 10x coverage there are always regions of the genome that are not recovered.
Back to the clever bit, the authors realized that putting these plasmids into E.coli represents HGT. These vectors are in fact derived from natural E.coli strains and are transferred naturally between strains. So, through the process of obtaining genome sequence for a variety of bacterial species, Sorek et al realized that the scientific community had inadvertantly set up an experiment to determine the limits of HGT. They simply (and by "simply" I mean anyone with a computer and knowledge of these systems could have done it, it is not meant to diminish the work or insights of the authors) took available genome sequence information from 79 distinct species and looked to see what was not sequenced using the process described above. Again, the idea being if a region was not sequenced, it must not have been propagated in E.coli (the gaps in a genome sequence are obtained using other more labor intensive methods). Indeed, the authors found regions from these species that were not able to be propagated in E.coli. Interestingly, these regions that could not be propagated were not random, but contained genes encoding specific types of proteins. However, the authors noted a given gene could be recovered in E.coli from at least some of the 79 species, thus it seems like no specific protein encoding genes always fail to be transferred into E.coli.
This in and of itself is interesting and important information, but the authors did not stop here. They actually took this bioinformatic data and conducted some biological studies (something done too little in the bioinformatic field in my opinion). The authors wanted to know why some genes were not readily transferred to E.coli. They went on to show that these underrepresented genes are toxic in E.coli. Interestingly, it was an innate property of the gene product's activity because they observed the same toxicity if the used an extra copy of the E.coli gene.
So here we have a paper in which the authors learn some interesting biology primarily because they were smart enough to come up with and follow up on a good idea.