Occasional comments on genetics, genomics and gene expression by Steve Mount. For news and shorter, more timely, posts see "News on Genetics."
Wednesday, March 08, 2006
From HapMap to selection map
On the other hand, the uncritical acceptance of results that are statistical in nature (and have a real possibility of being wrong) is disturbing. A recent visitor to Sarah Tishkoff's lab (Jeff Jensen, from Cornell, where he works with Aquadro and Bustamonte) gave a talk about the statistical problem of distinguishing selection from certain demographic phenomena that made me think the interpretation of selection maps is going to be extremely uncertain. It is surprising that none of those issues were addressed in Wade's article, especially so because the New York Times typically fills their science articles with quotes from others in the field. I felt the same unease a few weeks ago when watching a PBS documentary "African American Lives," in which famous African-Americans were given overly specific information about their ancestry without appropriate statistical disclaimers.
I suppose that we will all be talking a lot more about selection and race with my friends who are not geneticists, and putting a lot more population genetics into my graduate genetics course. Clearly, the idea that population genetics is passé is now passé.
Wednesday, August 24, 2005
Alternative splicing and host defense in flies and plants
What kind of adaptation does this make possible? Certainly, extreme variability allows rapid adaptation on a population level. Furthermore, the presence of membrane bound and secreted forms of the same molecule presents the possibility of adaptive immunity through clonal selection of hemocytes that see antigen. Louisa Wu pointed me to an article in Nature Immunity (Little, Hultmark and Read 2005) making the point that neither memory nor specificity has been ruled out in invertebrate immunity. True adaptive immunity in insects would be very exciting, but we're a long way from that. How could variation in isoform production among hemocytes in Dscam isoforms be heritable? Through epigenetic silencing of splicing factors? We're just at the beginning of this story.
The authors say this:
broad conservation of receptor diversity strongly suggests important
functions and future studies will have to further address
whether the presence of diverse immune receptors in
invertebrates increases the effectiveness of immune responses
of individual animals. Alternatively, given the relative short
life span of many invertebrates, it may be that immune
receptor diversity is less important ontogenetically but rather
enhances the adaptive potential of animal populations to
changing environmental and pathogenic threats.
Tuesday, August 16, 2005
Nature Genetics and the Mid-Atlantic Plant Molecular Biology Society Conference
There is always something interesting in Nature Genetics, but the July issue seems especially rich.
Postdocs
I appreciate this editorial. The advice here (e.g. that postdocs and advisors make a formal plan, and that postdocs ask themselves such questions as "is this the most important scientific question I can ask") is excellent. Anyone considering a postdoc, or taking on a postdoc, should read this.
Race
I am often asked (especially by educated non-scientists in my acquaintance) about genetics and race. This is an old debate and there are excellent sources of information and opinion (including a Social Sciences Research Council forum, a special issue of Nature Genetics and an edited volume by Jefferson Fish; a more complete listing is on Anthropology.net). The bottom line is that race is indeed a social construct. (At least it is very poorly defined within biology, and what biological definitions might be partially valid differ significantly from the way the concept is normally used in our society). The licensing of BiDil specifically for African Americans is therefore troubling. It seems to me that if a drug differs in either safety or efficacy for one "race" or another, then the underlying basis is probably either a genetic difference or a cultural difference. In the first case, the relevant genetic difference itself, or a related biomarker, would be much more reliable than popular notions of race. On the other hand, if the basis is cultural, the relevant practice (such as lifestyle or diet) should be identified. I was therefore gratified to see Nature Genetics publish this letter from Jonathan Kahn making the case against the misuse of race, as well as a sidebar showing how the media has misrepresented their own statements.
Transcriptional Gene Silencing, RNA polymerase IV and siRNAs
The association of specific RNAs (siRNAs) with silenced chromosomes presents something of a paradox (since the siRNAs themselves must be transcribed). This paradox is elegantly resolved by the discovery of "RNA polymerase IV," which is presumed transcribe otherwise silent regions, at least in Arabidopsis (Kanno et al.: "Atypical RNA polymerase subunits required for RNA-directed DNA methylation" Nature Genetics; PubMed; other recent papers cited therein and a News and Views by Vaucheret). In other species RNA polymerase II is implicated (e.g. Schramke et al.) but there may be less siRNA corresponding to silenced loci in those species. On a related note, I was impressed by the massive amounts of MPSS data on Arabidopsis siRNAs presented by Pam Green at the MAPMBS meeting last week. This data includes over 75,000 different siRNA sequences and will soon to be online at http://mpss.dbi.udel.edu/ in a browsable form.
Structural genomic variation within species
One of the insights I came away from last year's MAPMBS meeting with was the idea (Rafalski, MAPMBS2004, PubMed) that "races" of maize show significant variation in gene content due to small (sub megabase scale) structural differences: insertion, deletion and inversion. Although a speaker at this year's meeting expressed the opinion (based on sequence data) that the case in maize may have been overstated, another paper in Nature Genetics (Tuzun et al., Fine-scale structural variation of the human genome; NG; PubMed) reports 297 cases of "intermediate scale" structural variation in a single human individual! It will be interesting to see how this plays out with more time, but SNPs may well be displaced by presence/absence variation as the focus of attention in human genetics. As Charles Lee notes in his News and Views piece, what we see depends on our technology for looking, and I am reminded that a lot of early work in population genetics was based on inversions visible on polytene chromosomes.
Friday, August 12, 2005
Global regulation of alternative splicing: starting with Nova
The global analysis of alternative splicing is complicated by the fact that standard microarrays, and even tiling arrays without junction oligos, do a poor job of reporting on the ratio between alternatively spliced mRNA isoforms that share most of their nucleotides. Only in the past few years has alternative splicing data from arrays been reported (see Clark et al., 2002; Johnson et al., 2003; Stolc et al. 2004 and Pan et al. 2004). It was therefore exciting to see the paper by Ule et al. in the new issue of Nature Genetics reporting the effect of Nova2 knockouts on global patterns of alternative splicing in the mouse brain.
A custom microarray from Affymetrix was used for this study. Although I applaud the efforts of Hui Wang and John Blume to bring alternative splicing to the Affymetrix platform (and [full disclosure] I own some Affymetrix stock), custom chips are extremely expensive. I have my eyes on the Agilent platform used by Pan et al. and what I would really like to see is the widespread use of a common (inexpensive) platform so that publicly available data can be mined for unexpected associations.
Another notable aspect of this study is the truly remarkable degree of functional connection between proteins whose isoforms appear to be regulated by Nova2. Figure 5 in this paper makes a compelling case for the idea that while transcriptional regulators can turn gene sets on and off, splicing regulators can fine-tune an entire module for a specific task.
Finally, it is important to note that these experiments are facilitated by the fact that Nova2 knockout mice are viable, which rendered tissue-specific ablation (as practiced by Ding et al. and Xu et al. for similar studies on SR proteins) unnecessary. That is why we consider it a good thing that so many of the Arabidopsis SR proteins we work with are not essential. This does not mean they are not important!
Monday, July 25, 2005
PLoS Genetics
Friday, July 22, 2005
Parameters for using blastn with noncoding queries
If one wants to look for a conserved noncoding RNA in a new genome using the best possible tools, then one should use sophisticated structure-based methods such as Klein and Eddy's RSEARCH ( BMC Bioinformatics 4:44 , PubMed), and should consult the RNA database Rfam (Griffths-Jones et al., 2003: Rfam: an RNA family database and 2005: Rfam: annotating non-coding RNAs in complete genomes . PubMed). However, alignment tools such as blast or fasta are more readily available, so it is often expedient to use alignment when other tools would do better. If you do that, you must adjust the parameters – you will never find noncoding RNAs using the default parameters for blast. I confronted this problem at the Drosophila genome jamboree in 1999 and published the parameters I used there in the paper I wrote with Helen Salz (J. Biol. Chem.; PubMed).
Now, I've posted a discussion and "how-to" guide (Posting 4 on SteveMount.com) based on work that Chau Nguyen (a University of Maryland Computer Science and Biology double major) did with me a few years ago. These are written for use on the NCBI blastn server, but are easily adapted to running blast locally. Briefly, my advice is to use the parameters -r 5 -q -4 -G 10 -E 4 -W 7. These values not only find mammalian U6atac using plant U6atac but provide an alignment across the entire snRNA. If you don't find what you want, you may want to make adjustments based on the more thorough discussion in the posting, where I describe several parameter sets there that will correctly idenfity plant snRNA genes using animal snRNA queries.
Bear in mind two caveats: limit the size of your query and be prepared to use independent criteria for identifying correct hits. These searches require more computing resources than standard blast searches and it will generally take longer than the estimated time for your results to come back. For related reasons, you should not attempt to use these parameters for queries longer than about 500 bp. (if you are using a noncoding RNA as the query do not include nontranscribed flanking sequence in your query; you may even want to remove poorly conserved parts of the RNA itself from your query). Also, because the assumptions that go into calculating E values are violated by these parameters, the E values reported in your output will be meaningless (except as relative numbers; better matches will still have lower E values). Do not pay attention to the E values (except when comparing results obtained with the same parameter set) and do not report them. However, the lack of reliable E values is not license to believe nonsense; your results should be validated by external criteria such as secondary structure and conservation of known functional regions.
Good luck! If you have experience that bears on this, or can cite relevant literature, please let me know and I'll update the posting.