Monday, January 12, 2009

Does the rs7901695 C variant predispose to diabetes by creating a cryptic exon?

The discovery of disease-SNP associations through genome-wide association studies continues at a remarkable pace, but a recent review of common variants implicated in type 2 diabetes (T2D) suggests that, at least for this disease, current methods are unlikely to find many additional susceptibility loci (Prokopenko et al. 2008). We are now at the stage where "additional investigation is needed to define the causal variants, ... to understand disease mechanisms and to effect clinical translation." I continue to be interested in the (often underappreciated) contribution of pre-mRNA splicing to variation in gene activity, and I was especially intrigued by the statement that the variant with greatest effect size (the rs7901695 C variant in TCF7L2) lies in an intron but its mechanism of action is not understood. In order to investigate this I submitted the sequence surrounding this variant to SplicePort, our splice site predictor and analysis tool (see Dogan et al. 2007). Sure enough, the rs7901695 C variant alters the predicted strength of nearby splice sites. Because this sequence is deep in an intron, these would be potential cryptic splice sites, sites at which splicing occurs only in the case of a mutation.

SplicePort analysis of rs7901695
Most striking is the activation of a splice acceptor site 68 nucleotides upstream of the variant SNP (position 688 in the submitted sequence or 114,744,012 on chromosome 10). The SplicePort score, which is -0.41 for the T allele, but -0.02 for the C allele, can be understood by noting that while 95.66% of splice acceptors have a score greater than -0.41, 89.01% of splice acceptors score above -0.02. Thus, the C allele acceptor site, although still relatively weak, is clearly better and well within the range of variation for real splice sites (99% of acceptors score above -0.86 and the median score is 0.923).

How might a C to T change affect an acceptor splice site upstream? Spliceport provides a feature browser that lists the features used for scoring any site. In this case, the following "downstream features" contribute to the score of the C variant but not the T variant: cgg (0.112), ctac (0.083), cg (0.072), ctacg (0.06), tacg (0.059), acg (0.043) and acggg (0.035). An independent approach, ESEfinder, similarly identifies this sequence context (CTACGGG but not CTATGGG) as an exonic splicing enhancer potentially recognized by ASF/SF2, SRp40 or SRp55. Thus, the rs7901695 C variant might activate the upstream acceptor site by functioning as part of an exonic splicing enhancer that is activated by one or more SR proteins.

The next question is how the activation of an acceptor splice site deep within an intron would affect gene expression. A splice site that is activated by mutation is known as a cryptic splice site, and an exon that is used only in mutant alleles is referred to as a cryptic exon. Intron mutations that affect gene expression by creating cryptic exons have been known for some time. In fact, I wrote a commentary on several such mutations in the human beta-globin gene over 25 years ago while still a graduate student ("Lessons from mutant globins," Mount and Steitz 1983). Mutations that activate cryptic exons are often overlooked because they lie away from splice sites, and because the resulting RNA is often unstable due to nonsense mediated decay. Nevertheless, there are now hundreds of papers describing such mutations. The case here is especially tricky because the SNP does not directly create a cyptic splice site, but may activate one at a distance.

cryptic exon
Thus, activation of a cryptic exon is a reasonable hypothesis for the effect of the rs7901695 variant on TCF7L2. In this model transcripts from the C allele are more likely than transcripts from the T allele to be aberrantly spliced and ultimately degraded. The lack of EST data supporting a cryptic exon in this region can be explained by nonsense-mediated decay. This proposed mechanism is similar to regulated unproductive splicing and translation ("Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans." Lewis et al. 2003), an important difference being that cryptic exons are generated by mutation rather than being regulated alternative exons.

How likely is this hypothesis? I could not find any papers that have investigated the effect of this variant on splicing. Clearly, the next step is to look for evidence of the cyptic exon and verify that the C variant does indeed introduce an exonic splicing enhancer. There is also the possibility that other SNPs associated with the risk variant haplotype ("HapBT2D"), particularly rs7903146, are more likely to be causative (Helgason et al. 2007). I could not find a direct comparison of the relative risk for these two variants, but it's possible that association data alone will rule out rs7901695, or even that they already have. Colleagues have suggested that I pursue this in my own lab, but I work other things, and there are people in the diabetes field that can do this quickly. I only ask that they cite this post (here's how).

Although additional investigation is needed, the rs7901695 variant is certainly capable of explaining an effect on the expression of TCF7L2 through activation of a cryptic exon. This case is an example of how SplicePort can be used to evaluate the potential of variants to alter splicing. We plan to systematically evaluate the possible effect of all human SNPs on splicing. In the meantime, I strongly encourage investigators to use SplicePort to evaluate variants of interest on their own.

Saturday, November 08, 2008

Remembering C.C. Tan

I read this morning (link) that Tan Jiazhen, better known in the U.S. as C. C. Tan, passed away Nov. 1, at age 99. I suspect that his influence on genetics probably much greater than most Americans appreciate. He worked with the first generation of Drosophila geneticists, and he was Dobzhansky's first Ph.D. student at Cal Tech, yet his career extends into the modern era, and many of the young Chinese scientists coming to the United States now have met him. It's impossible for me to evaluate how much he is responsible for the intellectual "silk road" that contributes so much vitality to twenty first century genetics, but I suspect that without C.C. Tan it would be much less traveled. Interested readers should consult Jim Crow's commentary in genetics (Vol. 164, pg. 1 *) to see how he managed to bring Chinese genetics into the modern era, past the Lysenko years and the Cultural Revolution.

* This page, like most at genetics.org, does not load properly in Firefox on Windows. I'm sure that the GSA will fix that. For now, I just use another browser when I visit the GSA.

Sunday, July 13, 2008

Do I have the right to know my own genetic makeup?

I took a little time yesterday (July 12) to attend a panel discussion on direct to consumer genetic testing at the Genetic Alliance annual conference. Panelists were Sue Friedman, from FORCE; Trish Brown, from DNA Direct; Joanna Mountain, from 23andMe and Sean George, from Navigenics. Francis Collins, Director of the National Human Genome Research Institute, moderated. Once each of the panelists had made their opening remarks, Collins started the discussion by asking why, if personalized genetics is so wonderful, the states of California and New York have issued cease and desist orders to several personalized genetics companies (story). The response was conciliatory, echoing statements on the 23andMe blog (the spittoon):
We agree that this evolving field of personal genomics is in need of proper regulatory oversight. While our mission to provide accurate and contextual information to our customers about their genetic information is aligned with the regulatory mandate to protect the public health, we also want to ensure that efforts to rein in our industry do not hamper the potential benefit of genetic knowledge to our health.
Many relevant issues were brought up, by panel members, or by those in the audience. Can people deal appropriately with uncertainty? Do they understand the relationship between genotype, the environment and phenotype? What about genetic information with predictive value, but about which the consumer can do nothing? The case of APOE was discussed at length.

I do feel that we have a right to obtain information about our own genetic makeup without having to justify ourselves, to a physician, to an insurance company, or to the state of California. I am also skeptical of the perception that "most people are incapable of grasping the relevance of provisional, statistical information." In any case, an enterprise that feeds users with 500,000 bits of information, most of which have no significance, seems more likely to help people understand that genotype is not fate than to have the opposite effect.

Giving people genetic information can be separated from giving them advice, and it seems to me that providing information about genotype should be regulated only to the extent that technical standards are met. This is analogous to surveyors giving me information about the elevation of my house. That information, by itself, is not advice about flood risk, and I would be surprised if surveyors were required to provide accurate assessments of that risk in order to operate, or forbidden from providing consumers with data that a third party judged to be of little value.

The panel helped me to understand the risk of consumer fraud, but, ultimately, I feel, strongly, that I have a right to know my own genetic makeup. Furthermore, I find it insulting to say that consumers are incapable of understanding uncertainty. There is certainly room for regulation, but I hope that my right to pay someone to tell me about my own genes is not infringed. Perhaps it is most important to prevent companies from taking money for tests without providing portable genotype data whose implications can be evaluated by a third party in the light of new information, which could be information about the implications of that specific information, other genetic information that might influence how it is interpreted, or information about the interactions between that bit of genotype and other factors such as one's diet or medical history.

Links for this article:
  • NHGRI (National Human Genome Research Institute), with news, links to research, funding opportunities, fact sheets, and career opportunities, including such tidbits as a Catalog of genome-wide association studies.


  • "Should Personal Genomics Be Regulated" Tim O'Reilley's blog on the subject (with interesting comments and discussion).


  • DNA Direct. DNA Direct's services focus on personalized test result interpretation and supportive materials and services.


  • Genetic Alliance. The Genetic Alliance works to eliminate obstacles and limitations within the genetics community through novel partnerships among stakeholders and integration of individual, family, and community perspectives to improve health systems and inform decisions.


  • "Getting up close and personal with your genome," a news summary for the scientist written by Laura Bonetta and published in Cell (2008 May 30;133(5):753-6)

Sunday, January 20, 2008

Plant genomes, animal genomes, more and more genomes!

I recently returned from the Plant and Animal Genome Conference (XVI, Jan. 12-16, in San Diego). This conference is much more applied than what I'm used to, but I came because it seemed a good place to see comparative genomics in full bloom, and that turned out to be true. I was struck by the extent to which the meeting was a showcase for vendors (Agilent, Sequenom, BioTrove, Illumina, Roche (now incorporating 454 and Nimblegen), Affymetrix, Applied Biosystems, Keygene, etc.), many of whom literally wined and dined conferees at their workshops.

However, I was also struck by the extent to which new high-throughput sequencing technologies are already in widespread use. Ronan O'Malley (Ecker lab) described the sequencing of Cvi, a strain of Arabidopsis distinct from the Columbia accession already determined; in the process he compared 454 and Solexa sequencing. Steve Jacobson (UCLA) described the repeated re-sequencing of (bisulfite-modified) Columbia for the purpose of studying cytosine methylation. Several more plant genomes are in in the pipeline, and a sense of the pace is conveyed by the fact that plenary speaker Eddy Rubin (JGI) "announced" the completion of the soybean genome almost in passing.

Other plenary talks were uniformly excellent. I missed the initial talk, by Jerry Caulder, which was apparently quite controversial. David Baulcombe referred to it by saying that the European perspective on genetically modified foods is different and that "by shying away from the hazards we don't gain credibility." Another notable aside was Michael Ashburner's statement that "there is no point in funding biomedical research unless you also fund informatics."


Sunday, November 25, 2007

This week, it's ancestry

Last weekend there was a lot of buzz about personal genomics (see Genome Technology Daily, "It was a Helluva Weekend for Personal Genomics"; or Eye on DNA, "DNA Network Members Discuss Personal Genomics Service Providers 23andMe, deCODEme, and Navigenics"; or my previous post). This weekend, it's ancestry. Today's papers had two interesting features on ancestry testing, both of which nicely echoed my own post about caution regarding ancestry testing ("On Genes"). First, the New York Times business section ("DNA Tests Find Branches but Few Roots") discusses the business of ancestry testing. The article is nice in that it compares the cost of ancestry testing by various companies, shows that results differ, and quotes Henry Louis Gates Jr. making reasoned assessments of the role that DNA testing can play. Second, the Washington Post reviews "The Genetic Strand: Exploring a Family History Through DNA" by Edward Ball("Blue Blood, Black Genes").

The theme is clear. You can only learn so much about your ancestors from DNA.

Saturday, November 17, 2007

Ready or not, personalized genetics is here.

Yesterday's announcement by deCODE genetics that they would be launching a personalized genetics service, deCODEme (news release), means that a major player in gene discovery has just joined the growing field of companies offering personalized genetic services. As I wrote in my Nature Network blog, "On Genes" in "The Scientist Blogger and the Personal Genome," information about susceptibility to disease, potential for health or accomplishment and responsiveness to therapies is found in our genes, and it is going to be made available to people who want it. A lot of people are going to want it. Most are not going to be prepared to understand it. Even Jim Watson and J. Craig Venter aren't entirely sure what to make of their genomes. Genetic counseling may morph into a profession that serves everyone, not just those who faced with clear cases of genetic disease.

Journalists and scientists also have a role to play. Let me highlight three useful responses.

The New York Times has an excellent series called "The DNA age." These articles (all by Amy Harmon, at least so far), "explore the impact of new genetic technology on American life." One published today, "My Genome, Myself: Seeking Clues in DNA" describes her use of the 23andMe service.

Bertalan Meskó, a blogger at "ScienceRoll," presents coverage of Personalized Medicine, including a summary of breaking news (today) and a review of services offered by Navigenics, 23andMe and Helix Health (last week, before the deCODE announcement).

I have started "Information on Genes," (ongenes), a web site that is intended to be a place where answers to questions on genes, genetics and genomics are provided by experts in the field. Questions will be posted anonymously but answers will not. I plan to solicit answers from people in the know. My hope is that ongenes will provide useful information to anyone trying to understand genetic tests, including professionals in the field.