Friday, January 03, 2014

What is a gene?

A gene is all of the DNA elements required in cis for the properly regulated production of a set of RNAs whose sequences overlap in the genome.   
I formulated that definition c. 1990, when I started teaching genetics to graduate students. I think that the course I actually taught was quite different from the plans leading to that formulation, but I remember sitting for several hours in a coffee shop in Newark airport and coming up that definition. This was after the discovery of splicing, transposable elements, remote enhancers, overlapping genes, nested genes, long noncoding RNAs and many short noncoding RNAs, and I imagined discussing literature on each of these topics and its implications for how a gene might be defined. 1990 was before “tweet-length” could be applied, before the discovery of microRNAs and (most significantly) before complete genome sequences and high-throughput data in the style of ENCODE.

I believe this definition has stood the test of time, and that it will continue to provide a useful understanding of what is meant by a gene. 

The fact that it was written to accommodate work that predates complete genome sequences, ChIPseq and whatever methods are developed in the coming years, should be kept in mind as we face hype about new discoveries changing our view of the gene. I predict that later this year some new work will be described as overturning the idea of junk DNA, or the idea of genes as beads on a string, or the notion that genes are merely their coding information, or perhaps all of these. These discoveries will be said to account for the dark matter of the genome and other deep mysteries that were unsolved until now. Faced with that hype, I will link to this post.

In 2014, as part of my plan to write more but shorter posts, I will also report the history of my own understanding of several of the issues that make defining “a gene” problematic.
Mark Gerstein almost immediately pointed out that he had published a very similar definition in 2007:

The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.
See PubMed: Pubmed ID 17567988 or 
Gerstein lab: or 
Genome Biology

No comments: