On Genetics

No simple genetics for thedress - but does color vision affect compensation for lighting?

2015-03-16T16:19:00.000-04:00

In my previous post, I asked whether the way that one sees the now famous dress might have a genetic influence and invited people to send me family data. I got data from 28 families (thank you very much!), and have some conclusions.

First, this cannot be strictly genetic.

There are examples of monozygotic twins that see the dress differently, and there is a significant minority of people who see it differently from one time to another. These observations are inconsistent with a purely genetic basis. In my own data, I have four families where both parents see blue and black; four of the 12 children in these families see white and gold. I also have nine families where both parents see white and gold; here seven of 20 children see white and gold. Thus, neither trait breeds true.

However, there are some hints.

I noticed that 64% of sibling pairs, and 8 of 9 sister pairs, see the dress the same way. I also noticed that in families where the parents differ, 10/11 daughters see the dress as their father does, which is suggestive of an X-linked partially dominant factor. This led me to ask whether daughters preferentially show the paternal phenotype in families where the parents have the same phenotype. Of cases where I knew the gender of the child, daughters saw the dress as their father did 4/6 times (evenly divided between the two phentoypes). So, overall 14/17 daughters agree with their father. This is significantly different from expected (a simple two-tailed chi-square test with one degree of freedom yields a p value of 0.008).

Since the X chromosome carries the highly polymorphic cone opsin genes that are known to affect color vision, I’m wondering if how one perceives the lighting on the dress (dark vs. light) is affected by these genes. Mechanisms by which women always use color to correct for lighting as their fathers do because of X-linked opsin genes are nearly ruled out by the observation that mutations affecting these genes (red/green color blindness) are recessive. If women used only their father's cone opsin genes, then they would inherit color blindness from their fathers. However, I say "nearly ruled out" because I can think of (admittedly unlikely) scenarios whereby a specific subset of cone cells that plays a role in compensation for lighting also preferentially inactivates the maternal X. Of course, the limited data here are also consistent with partial dominance, with some other X-linked gene, or, indeed, with no genetic influence at all.

Does normal variation in color vision affect compensation for lighting?

Since the dress illusion is understood to involve compensation for lighting, I am drawn to the question of whether or not variation in color vision affects this compensation. To address this, I’ve come up with a second form, which involves two things:

1) reporting how you see the same image without color

and

2) taking a vision test (the PANTONE® Online Color Challenge).

I'm asking people to say
-- How they see the dress without color
-- How they see the dress with color and
-- How they score on the online color test (both the numerical score, and, if possible, a screenshot of the results showing the pattern of color discrimination across the spectrum).

The form is available at ongen.us/DressColorForm

This second survey is about the limit of what I can or should do informally through social media. I’m pleased to hear that 23andMe is asking people about how the see the dress. If you have an account with them, then you can contribute at 23andme.com/you/quick_questions/

Note (March 23): 23andMe has posted results from 25,000 responses. They find no strong genetic associations, but an effect of age and an association with whether one lived as a child in a rural (blue and black) or urban (white and gold) setting. They did not look at transmission within families.

blog.23andme.com/23andme-research/genetics-and-that-striped-dress/

Data Summary March 16

I collected data from 28 families.

The total frequency was approximately half and half. 44 saw blue and black while 54 saw white and gold (six were some sort of intermediate or other; three went back and forth).

In 4 families both parents saw dark colors (blue and black): 6/10 of their children saw colors; 4/10 saw light colors

In 10 families both parents saw light colors: 13/21 children saw dark colors; 7/19 children saw light colors and one child saw the dress differently over time.

In 3 families the mother saw dark colors while the father saw light colors. In this case, 6/6 children saw light colors. All were daughters.

In 6 families the mother saw light while the father saw dark. In this case, 8/11 children (four daughters and four sons) saw dark colors and 3/11 (two sons and one daughter) saw light colors. 3 of 4 daughters saw as their father did.

Mom	Dad	Families	Dark	Light	Other
Dark	Dark	4	8	4	0
Light	Light	9	13	7	0
Dark	Light	3	0	6	0
Light	Dark	6	8	3	0

Could the dress illusion be genetic?

2015-03-01T14:28:00.002-05:00

I am very curious about whether the dress illusion might have a genetic basis. I'm referring to differences in the way that people see the dress in this photo:

Most explanations of the fact that people see this differently (e.g. Steven Pinker, writing in Forbes) have to do with unconscious compensation for lighting. I'm sure that those explanations are generally correct, but which way you see it (whether and how much you compensate) may still have a genetic basis. The fact that very few people report a change in how they see it is consistent with a genetic (or at least biological) basis.

So, I'm trying to find out if how one sees "the dress" is inherited in a Mendelian manner. This is an informal poll (not a proper scientific study) to get a rough idea of inheritance. (Is the trait inherited in a Mendelian way? Is either way of seeing the dress dominant?).

Please respond if (and only if) you belong to a family and have data for both parents and one or more full biological children.

Thanks! I'll post results here.

To respond, please visit ongen.us/DressGenes and fill out the form.

Nicolas Wade’s troubling ideas

2014-08-10T11:48:00.002-04:00

Among the popular myths about human genetics left over from the era of eugenics, social Darwinism and racism, two are especially relevant to Nicolas Wade’s recent book, “A Troublesome Inheritance: Genes, Race and Human History.” The first is that natural selection has stopped due to advances in health and medicine, and that, as a result, the unfit are now contributing more to each succeeding generation. Early in his Book, Wade disagrees, stating that “human evolution has been recent, copious and regional”, and much of the first part of the book is devoted to this claim. I think this statement is well-supported by modern genetics. Wade goes further, arguing that in fact, selection favors those who are economically successful. Here, demography and historical records have more to say than genetics, and Wade relies heavily on the work of Gregory Clark, an economic historian at the University of California, Davis, especially the book “A Farewell to Alms” which he reviewed favorably for the New York Times in 2007. I am skeptical about the connection between affluence and Darwinian fitness; I don’t think there are genetic data either way.

Wade gets into trouble when he tries to find support in modern human genetics for a second major myth, which is that humanity can be meaningfully divided into a small number of types (races), and that these types have biologically meaningful differences in things such as intelligence and moral character. Virtually all practicing human population geneticists, including those whose work he cites, are in agreement that this speculation is unsupported, and today’s New York Times carries a succinct statement signed by many of them, featuring a simple message:

We are in full agreement that there is no support from the field of population genetics for Wade’s conjectures.

The letter is here. The list of signatories, here, contains 139 names, including every prominent human geneticist that I thought to look for.

Why the outcry? People who devote their scientific lives to the study of human genetic variation think about race and popular misconceptions all of the time. They care that their work is accurately represented.

For those who wish to read a more detailed rebuttal of Wade’s arguments, I recommend Jeremy Yoder in the Los Angeles Review of Books, but there are many other good ones.
The original New York Times book review, by David Dobbs, is here.

For those who want to read less, I leave you with one very brief quote.

He’s claiming to be a spokesperson for the science and, no, he’s not.

- Sarah Tishkoff (David and Lyn Silfen University Professor in the Departments of Genetics and Biology at the Universisty of Pennsylvania, quoted in a Nature News Blog)

----------------------------------------
Postscript (additional commentary):
- Nicolas Wade's reply (New York Times, Aug. 22)
- Marcus W. Feldman in the Computational, Evolutionary and Human Genomics at Stanford blog.
"Echoes of the Past: Hereditarianism and A Troublesome Inheritance" Marcus W. Feldman is the Burnet C. and Mildred Finley Wohlford Professor in the School of Humanities and Sciences at Stanford and a Founding Director of CEHG.

What is a gene?

2014-01-03T21:49:00.000-05:00

A gene is all of the DNA elements required in cis for the properly regulated production of a set of RNAs whose sequences overlap in the genome.

I formulated that definition c. 1990, when I started teaching genetics to graduate students. I think that the course I actually taught was quite different from the plans leading to that formulation, but I remember sitting for several hours in a coffee shop in Newark airport and coming up that definition. This was after the discovery of splicing, transposable elements, remote enhancers, overlapping genes, nested genes, long noncoding RNAs and many short noncoding RNAs, and I imagined discussing literature on each of these topics and its implications for how a gene might be defined. 1990 was before “tweet-length” could be applied, before the discovery of microRNAs and (most significantly) before complete genome sequences and high-throughput data in the style of ENCODE.

I believe this definition has stood the test of time, and that it will continue to provide a useful understanding of what is meant by a gene.

The fact that it was written to accommodate work that predates complete genome sequences, ChIPseq and whatever methods are developed in the coming years, should be kept in mind as we face hype about new discoveries changing our view of the gene. I predict that later this year some new work will be described as overturning the idea of junk DNA, or the idea of genes as beads on a string, or the notion that genes are merely their coding information, or perhaps all of these. These discoveries will be said to account for the dark matter of the genome and other deep mysteries that were unsolved until now. Faced with that hype, I will link to this post.

In 2014, as part of my plan to write more but shorter posts, I will also report the history of my own understanding of several of the issues that make defining “a gene” problematic.
--------------------
Mark Gerstein almost immediately pointed out that he had published a very similar definition in 2007:

The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.

See PubMed: Pubmed ID 17567988 or
Gerstein lab: http://archive.gersteinlab.org/papers/e-print/grgenerev/preprint.pdf or
Genome Biology http://genome.cshlp.org/content/17/6/669.long

Michael Pollan on plant behavior, good and bad

2014-01-02T22:17:00.002-05:00

A friend asked my view, so I read the recent article by Michael Pollan in the New Yorker, "The Intelligent Plant."

Michael Pollan is a very good writer and he picked an interesting topic. Plant behavior is indeed fascinating and he does a good job of fascinating his readers without obviously going far beyond what can be supported. I also think he does justice to the community of plant biologists by presenting people's views in their own words. However, I fear that he may have incited enthusiasm for bad science. A critical point in the article occurs when he points out that the argument is about language.

Many of the scientists in [Gagliano's] audience were just getting used to the ideas of plant “behavior” and “memory” (terms that even Fred Sack said he was willing to accept); using words like “learning” and “intelligence” in plants struck them, in Sack’s words, as “inappropriate” and “just weird.” When I described the experiment to Lincoln Taiz, he suggested the words “habituation” or “desensitization” would be more appropriate than “learning.” Gagliano said that her mimosa paper had been rejected by ten journals: “None of the reviewers had problems with the data.” Instead, they balked at the language she used to describe the data. But she didn’t want to change it. “Unless we use the same language to describe the same behavior”—exhibited by plants and animals—“we can’t compare it,” she said.

I agree that we should use the same language to describe the same behavior, and applying the words 'behavior' and 'learning' to plants make sense to me. That we use these terms (appropriately, I think) for robots and computers points out that they are neutral with respect to mechanism. However, I don't think that 'intelligence' or 'consciousness' would be appropriate for anything described in this article. The prefix 'neuro' refers to neurons or the nervous system and we know for a fact that plants have nothing like neurons. It's pretty clear that multicellularity evolved independently in plants and animals, and there are important differences, so I find it highly unlikely that plant and animal behavior shares underlying mechanisms. Thus I very much doubt that there is “some unifying mechanism across living systems that can process information and learn.” While fundamental processes common to all life are no doubt shared, more sophisticated signaling is unlikely to be the same. Cell walls make it hard to see how information could be possibly be transmitted through synapses, which are specialized points of contact between neurons. On the other hand, plasmodesmata, channels that allow direct but reguated transport between cells, provide plant cells with the potential for mechanisms unavailable to animal cells. Thus, while communication between the parts of a plant is likely to be as sophisticated, if not more sophisticated, than comparable mechanisms in animals, it is very different, and much less well understood. We would do better to appreciate plants on their own terms. I hope that this article leads more young people into the exciting field of plant signaling. I fear that it may do so for the wrong reasons.

Time-Lapse HD Plants following light

Links:

“The Intelligent Plant,” by Michael Pollan in the New Yorker. Dec. 23, 2013. Cleve Backster, an obituary in the New York Times Magazine. The best-selling book, “The Secret Life of Plants,” was inspired by Backster’s research.

ENCODE: Data, Junk and Hype

2012-09-08T18:32:00.000-04:00

This week saw the publication of dozens of papers in Nature, Science and Genome Research that report an initial analysis of data from the Encyclopedia of DNA Elements (ENCODE) project on RNA, transcription initiation, transcription factor association, chromatin structure and histone modification. The scale of this data is staggering, and it will change how human molecular genetics is done. Imagine how the field of climatology would be changed if they suddenly had hundreds of years of complete weather data from thousands of weather stations. This is comparable.

ENCODE data, visualized with the UCSC genome browser.

What ENCODE does not do is fundamentally change our view of what the genome looks like.

The third and fourth sentences of the main article in Nature are these:

These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation.

This "result" has been emphasized in the popular press.

Hype: This lead article in Thursday's copy of the Washington Post Express (a publication of the Washington Post distributed on DC's Metro) is typical of how the story was covered.

In particular, the conclusion that this study "overturns theory of 'junk DNA' in the genome," which was the title of the article in The Guardian and which was echoed by many who should know better (e.g. Science) is, well, junk. What the ENCODE project has done is locate the sites on human DNA that are represented in RNA, and the sites at which numerous factors bind. Because 80% of the genome has some biochemical "function" of this sort does not mean that 80% of the genome has some effect on gene expression (although these data will help us immensely in the task of figuring out which noncoding nucleotides do indeed affect gene expression), and we can still be quite sure that most of that 80% does not have any biological function in the usual sense of the word, which is that if you delete it or alter it, something that matters biologically or medically will change. We still know that most of the millions of single nucleotide polymorphisms that distinguish any two copies of the genome don't matter very much. It is simply not the case that the vast majority of the human genome has some (biological) functional importance.

Conversely, we have known for a long time that a lot of noncoding DNA does have a function. Most of the sequence that does matter is not coding. One measure of that is conservation, and the earliest complete mammalian genomes, in 2005, showed that about 5.3% is conserved among mammals (vs. only about 1% that is coding). A direct attempt to use ENCODE (and 1000 genomes) data to estimate the fraction of the genome under purifying selection (Ward and Kellis, this week) finds "an additional 4% of the human genome subject to lineage-specific constraints." While this is a big increase in the estimated fraction of the genome subject to purifying selection, the total is still only about 10%, leaving 90% as neutral.

We have also known for a long time that most RNA transcripts do not result in cytoplasmic messenger RNAs (Salditt-Georgieff and Darnell JE Jr. publised a paper in 1981 with the title "Further evidence that the majority of primary nuclear RNA transcripts in mammalian cells do not contribute to mRNA.") and specific transcripts in noncoding regions were described by the end of the 1980s.

The science blogosphere has been aflame for the last two days as scientists attempt to debunk this hype. Those bloggers (many of whom are authors on the ENCODE papers) have provided excellent summaries of the issues surrounding the notion of junk DNA. I have bookmarked several on delicious (tag: ongenetics/ENCODE) and some (mostly the same ones) are listed below.

To my mind, the biggest problem is that what is not news (that not all noncoding DNA is junk) has been allowed to eclipse what is news (that we have a vast trove of data that allows us to assess possible functions for all nucleotides).

Links:
http://genome.ucsc.edu/ENCODE/
The gateway to ENCODE data (through the UC Santa Cruz genome browser)

http://www.genome.gov/10005107
The ENCODE project web site.

http://www.nature.com/encode/
This is Nature's gateway to the literature. It's a little (OK, a lot) gimmicky, so you probably want to just visit the tables of contents: Nature, Science, Genome Research.

The Finch and the Pea: ENCODE Media Fail
This blog post by Mike White is a survey of media hype documenting numerous errors resulting from the hype (or misplaced focus).

Encode (2012) vs. Comings (1972)
This blog post by T. Ryan Gregory presents a serious review of the concept of "junk DNA."

ENCODE: My [Ewan Birney's] Own Thoughts
Ewan Birney on his own blog.

A Neutral Theory of Molecular Function
This blog post by Michael Eisen "wrestles" with the idea of junk DNA.

I want to end by pointing out that there are lots of people (me and my group included) who have already been wrestling with this issue, with lots of interesting ideas and results already out there. From an intellectual standpoint I’d like to particularly point out the influence the writings of Mike Lynch have had on me – see especially this.

ENCODE: The Rough Guide to the Human Genome
Ed Yong's post (at Discover Magazine), has been revised in the last day or so to be more cautious about the hype.

Cryptogenomicon: ENCODE says what?
This post by Sean Eddy makes the points that "The human genome has a lot of junk DNA," that "Noncoding DNA is part junk, part regulatory, part unknown," that "ENCODE’s definition of 'functional' includes junk" and that "Evolution works on junk." His post has dozens of comments, mostly from experts in the field.

Finally, a few screen shots from Twitter in the last few days:

Reaction to ENCODE media hype on Twitter ranged from blind propagation to harsh criticism.

Genetic Genealogy and the Single Segment

2011-02-19T12:33:00.011-05:00

Last year, my wife Janet and I sent our DNA off to 23andMe for analysis. Among the tools that they provide is a "Relative Finder," which lists other people on the site who share regions of DNA that appear to be identical by descent. In my case, there are 476 people listed, each sharing between 0.07% and 0.46% of my genome, almost always as a single segment (there are 18 people with whom I share two segments). These people are generally anonymous, but you have an opportunity to make contact and invite them to "share genomes," which means only that you can see which regions are shared.

There are a lot of people on 23andMe who are quite interested in this tool, and who use it for genetic genealogy. Many of these same people also use Family Tree DNA and ancestry.com. As a result of my interactions with these 23andMe relatives, and following the discussions on the 23andMe community forums, I have been thinking about, and researching, what it means to share one segment of DNA by descent with someone. In the process, I have realized some things that are not fully appreciated by most of the genealogy buffs on 23andMe.

I am presenting these insights here, and will consider them one at a time.

Distant relatives often share no genetic material at all.
It is possible to share a segment with very distant relatives.
Sometimes, more distant relationships are more likely.
Most of your relatives may be descended from a small fraction of your ancestors.

Distant relatives (fourth cousins and beyond) often share no genetic material.

The chances of not sharing any DNA at all becomes appreciable with fourth cousins and rises to approximately half with fifth cousins. This is based on my own simplified calculations and those of Donnelly (1983), who opines that "proof of descent from William Shakespeare does little to increase the probability that the claimant has genes in common with him." There are limits to what can be accomplished by genetic genealogy that are imposed by the real chance that you simply do not share any DNA at all with distant relatives. The more distant the relationship, the more likely it is that no DNA is shared.

On the other hand, you have to inherit your DNA from somebody, so there are some blocks of identity by descent that have been transmitted many generations.

It is possible to share a segment with very distant relatives.

"The probability that fourth cousins share at least one IBD [identical by descent] segment is 77%, and the expected length of this segment is 10 cM." Now consider the next step. There is a 50% chance that that one shared segment will not be transmitted at all, but a 90% chance that if it is transmitted it will be just as big as it was (the same 10 cM.). What this means for genealogy on 23andMe is that for two people sharing one segment identical by descent there is no way to reliably estimate how far back the common ancestor was. Furthermore, no improvement in software can possibly change that, because the limitation is imposed by the genetics itself.

No matter how far back you go, every nucleotide of one's genome is derived from some ancestor, and even going back 20 generations, the chance that the bit which has been inherited is part of a block 5 cM. or greater is still appreciable. In fact, even for 19th cousins, there is a real chance (13%) that any segment of DNA they have inherited in common will be 5 cM. or greater. This number is based on the term (1 - P(rec))ⁿ, where P(rec) is the probability that the segment will be broken up by recombination (1-size/100, where size is in cM.). For 19th cousins sharing a single ancestor, n is 40.

Of course, as mentioned above, there is very little chance that two 19th cousins will share any IBD segments at all, but this is offset if one has many 19th cousins, which is often the case.

Sometimes, more distant relationships are more likely.

23andMe reports a "predicted relationship" (e.g. "4th cousin") and a "relationship range" (e.g. "3rd to 7th cousin"). However, these ranges are likely to be wildly inaccurate, because the likely distance to a common ancestor, given only the information that two people share a single IBD segment, can vary enormously, based largely on how many relatives one has.

Here is my estimate of these values. You can skip this paragraph is you're not interested in the details.

The probability that a segment, if transmitted, will not be broken up by recombination is 1 minus the probability of recombination, which is 5% for a 5 cM. segment, 10% for a 10 cM. segment and so on. (If you are moving up a pedigree, this is the probability the segment was transmitted rather than created by recombination, but the value is the same.)

The probability that a segment is transmitted at all is one-half per generation.

Thus, for an nth cousin sharing a single ancestor, the probability is ((1-P(rec))/2)^(2n+2).

For an nth cousin sharing two ancestors (the usual case), the probability is

2(((1-P(rec))/2)^(2n+2)). For example, the probability of two 4th cousins sharing a specific 5 cM. segment is 2(((0.95))/2)^(10)) = 0.00117. If one has more than 855 4th cousins, then the expected number of 4th cousins sharing this segment will be greater than 1. Because every 4th cousin has the same chance of inheriting the segment, the expected number of 4th cousins who do share the segment will be directly proportional to the number of 4th cousins one has. In the case of 5th cousins, the probability of sharing a specific segment is 2(((0.95))/2)^(12)) = 0.00026, which would require 3,790 cousins for the expected number sharing the segment to exceed 1.0. In general, the number of cousins of a specific degree who should be expected to share a segment is given by

2(((1-P(rec))/2)^(2n+2)) x N

where N is the number of relatives of that degree. For a 5 cM. segment, if the number of cousins of degree n+1 that you have is 4.43 times the number of cousins of degree n that you have, then you expect more cousins of degree n+1 than cousins of degree n to share the segment. For a 10 cM. segment, this ratio is 4.94.

Thus, if you have many more distant cousins, as would be expected if your ancestors had large families, then someone who shares a single IBD segment is more likely to be a distant cousin, because you have so many more distant cousins. The point where the increase in the number of cousins outweighs the loss of shared segments is five children per family. This is not extremely uncommon.

As an alternative to the math, consider the case of my (hypothetical) great-great-great-grandfather Joe. Let’s say that I have inherited a 5 cM. segment of DNA from him. (It’s likely that I have inherited at least one segment from him.) Our concern is whether a distant relative that shares this segment is more likely to be a fourth cousin also descended from Joe or a fifth cousin descended from Joe’s father Jacob. The chance that the 5 cM. segment was inherited by Joe, from Jacob, is slightly less than half (because of the possibility of recombination in that generation). Jacob had 12 children, so I can expect to have 12 times as many fifth cousins descended from Jacob as fourth cousins descended from Joe. That fact ends up being more significant than the chance of recombination, so I will share the segment in question with more fifth cousins than fourth cousins. This same logic applies to fifth vs. sixth cousins and so on.

Thus, my 23andMe relatives sharing one IBD segment might be fourth cousins, as predicted, or they might be distant cousins connected by prolific ancestors. There is no way to know.

The world population has increased perhaps 20-fold in the last millennium, but that works out to significantly less growth than the sustained doubling required to predict distant ancestry for people who share one IBD segment. Nevertheless, there are well-documented cases of rapid demographic expansion.

Most of your relatives may be descended from a small fraction of your ancestors.

Given that family size varies a great deal, it is no doubt common to have some ancestors who have left many more descendants than others. We all have 64 great-great-great-great-grandparents, typically in 32 couples. If one family among the 32 had five children and their descendants did as well, while others in the family reproduced at replacement rates (two children per family), then your more prolific ancestors (the parents of just one of your 32 great-great-great-grandparents) would account for over 3/4 of your fourth cousins.

In summary, it is impossible to know the relationship one has to relatives who are discovered by virtue of their sharing a single autosomal segment of DNA. The "predicted relationship" is uncertain, and even the range is hard to be sure of. The extensive information provided by 23andMe is a very useful tool for genealogy, but it cannot tell you about relatives with whom you do not share any genetic material by descent. On the other hand, relatives with whom you do share genetic material by descent can be quite distant.

Defending science blogs

2010-08-01T12:51:00.006-04:00

Although I am not on ScienceBlogs, I am a science blogger, so Virginia Heffernan's article on science bloggers in today's New York Times Magazine ("Unnatural Science: The uses and abuses of science blogging") got my attention. Her position that science blogs are given to "trivia, name-calling, saber rattling" and "gratuitous contempt" compelled me to reply.

The frequency with which I update my blogs is probably best described by a professional journalist as "never," but I do take blogging somewhat seriously, and I try to be professional about it. My affiliation is on the side bar, and I have read (and re-read parts of) such books as "Am I Making Myself Clear?: A Scientist's Guide to Talking to the Public," by Ms. Heffernan's more temperate colleague, Cornelia Dean.

The article starts out with an appeal to deconstructionism:

Deconstructing science is a fool’s game. In the ’90s, literary critics used to try. They’d argue that science is a system of metaphors, complete with a style and an ideology, rather than the royal road to the truth. They were laughed at as cultural relativists, posers high on Gauloises and nut jobs who didn’t believe in gravity.

Although amusing and partly true, this is a misrepresentation. Science does have a style and an ideology and some of us acknowledge that. In fact, my own reading of science is informed by an awareness of the differing styles and ideologies that dominate different fields and traditions within science, an awareness that has been made more acute by my own personal exposure (primarily through marriage) to literary criticism, postmodernism and social science. What scientists object to is the notion that science is nothing but a system of metaphors. Scientists uniformly believe that there truths about nature that exist quite apart from ourselves, and that science provides a tool for learning those truths. I will also admit that some of us think that, within academia, posers and nut jobs have a much easier time succeeding in fields outside of science.

Last month ... 20 or so high-placed science bloggers angrily parted ways with an extremely popular and award-winning online collective called ScienceBlogs because it starting running Food Frontiers, a nutrition blog that PepsiCo paid to have on the site.

I missed this. What can I say? I don't find enough time to blog, or even to read other blogs, although I keep thinking I should start doing it more.

ScienceBlogs has become preoccupied with trivia, name-calling and saber rattling. Maybe that’s why the ScienceBlogs ship started to sink.
...
does everyone take for granted now that science sites are where graduate students, researchers, doctors and the “skeptical community” go not to interpret data or review experiments but to chip off one-liners, promote their books and jeer at smokers, fat people and churchgoers?

Perhaps, but the ones I read this morning (those on genetics, including personal genetics) have "interesting stuff." Some of it is a bit pedantic and perhaps not that interesting to the general public, but most of the posts I looked at stuck to the science or discussed policy, and those that discussed policy were perfectly civil.

By the way, I'd recommend "Genomes Unzipped" to readers interested in a diversity of opinion about the week's events surrounding regulation of personal genetics services. Genomes Unzipped is "a group blog providing expert, independent commentary on the personal genomics industry." It is not part of ScienceBlogs, but some individual bloggers post to both.

Under cover of intellectual rigor, the science bloggers — or many of the most visible ones, anyway — prosecute agendas so charged with bigotry that it doesn’t take a pun-happy French critic or a rapier-witted Cambridge atheist to call this whole ScienceBlogs enterprise what it is, or has become: class-war claptrap.

Is she jeering?

Science blogs (including those on ScienceBlogs) are a mixed bag, just like most of the internet, and the New York Times. Readers have to exercise judgment.

Finally, there is a sidebar with recommendations, which I have to applaud.
[Update: Actually, it was a mistake to applaud this. See comments.]

SEMPER SCI
For science that’s accessible but credible, steer clear of polarizing hatefests like atheist or eco-apocalypse blogs. Instead, check out scientificamerican.com, discovermagazine.com and Anthony Watts’s blog, Watts Up With That?

SCIASPORA
David Dobbs, who quit ScienceBlogs, has written well about the consequences of “unbundling” the ScienceBlogs bloggers. See his blog at its new location at neuronculture.com.
(SCI)ENCE
Stanford’s Presidential Lectures in the Humanities are archived — and helpfully linked — at prelectur.stanford.edu. Don’t miss Jacques Derrida’s from the spring of 1999. You will think. You finally almost know. What deconstruction. Is.

Can we not speak of fish?

2010-05-29T07:58:00.019-04:00

I would like to defend the use of paraphyletic groups in scientific discourse and literature. Paraphyletic groups can be well-defined in terms of monophyletic units (as relative complements), and defining paraphyletic groups in terms of monophyletic groups is preferable to treating them as invalid.

Let me start with a story. Wednesday evening (May 26th) I checked my Twitter feed, and saw a number of tweets from Jonathan Eisen (phylogenomics), who was at the ASM meeting.
Jonathan is in the department of Ecology and Evolution at UC Davis, the author of a popular textbook on Evolution and a frequent blogger ("Tree of Life"). For those of you not used to reading Twitter feeds, note that the most recent tweets are at the top.

I know both Norm Pace and Jonathan Eisen. Thanks to Norm's personal style and Jonathan's excellent selection of quotes, reading this was like being in the room with Norm. I love hearing him talk. However, I do not entirely agree with him. I have spent my life studying gene expression in eukaryotes, and my perspective is that the differences between eukaryotes and other species ("prokaryotes") are fundamental. In prokaryotes, coupled transcription and translation (which is impossible when there is a nucleus) allows the widespread use of polycistronic mRNAs, which allow operons, which in turn contribute to many important features, including the ease with which biologically useful bits of genetic information can be horizontally transferred. The argument, repeated here by Norm Pace, that "no one can say what a prokaryote is, only what it is not" was addressed by Martin and Koonin, who proposed a "positive definition of prokaryotes" based on coupled transcription and translation. This, however, is not the point. The point is that the nucleus is a derived feature and prokaryotes are a paraphyletic group, meaning that the last common ancestor of all prokaryotes has descendants that are not prokaryotes. Nevertheless, the group is well-defined (as all life other than eukaryotes) and useful, so I commented:

A bit later, I commented again.
Prokaryotes are a paraphyletic group. That means that the last common ancestor of all prokaryotes has eukaryotic descendants. Most taxonomists today prefer not to talk about paraphyletic groups at all, but to speak only of monophyletic groups, or clades (which consist entirely of species with a common ancestor). However, there are many paraphyletic groups that "make sense" and are commonly used. Examples include prokaryotes, fish, reptiles and dicots.

My point is that defining a paraphyletic group as the relative complement of one clade with respect to another makes it well-defined, and such a definition more closely suits what people have in mind.

In the hypothetical example shown here, most taxonomists would want to list "natural taxa" (by which they would mean monophyletic groups, or clades), and would say something like "Q, R and S are slithy." To say "G other than C are slithy" is more compact because it makes reference to fewer taxa. To say "P are slithy" is exactly the same, and is the most compact way of making the statement, but requires reference to a paraphyletic group.

To pursue this further, I asked my colleagues what they thought:

My dear friends in systematics,

I have a question about systematics that I would like your opinion on. It seems a sufficiently central question that I suspect you have already formed an opinion. The issue is a practical one, regarding how biologists should use terms. It is also philosophical (but in the rigorous sense, relating to the idea that without a proper philosophical basis one cannot do science at all).

Consider a monophyletic group of organisms, G, and another phylogenetic group within it, C (for clade). Let us suppose that C is characterized by some fundamental innovation, such that organisms within this clade have a long list of features not found in the other species within G. Furthermore, species within G but not C share a long list of features that have been lost by all species in C. As a result, there is a need to talk about another grouping, W (for wrong), of those species within G but not C. There is no doubt about the phylogeny. C and G are monophyletic but W is not. Molecules and morphology agree. However, all species within W share many features lacking in all species within C, and this is true both morphologically and molecularly.

Is it ever right for a scientist to talk about W as a group?
You know the list (reptiles, fish, dicots, prokaryotes).

Back story.
This came up last night as an argument between Jonathan Eisen and myself, on Twitter. You can see most of it by looking at feeds for
phylogenomics, ongenetics and smount, but given the volatile and perspective-based nature of Twitter feeds I've pasted the relevant tweets into the attached word document (it reads from most recent to earliest so my might want to start at the bottom and work up). Jonathan is at the ASM meeting. He is a Twitter addict who has generated over 4500 tweets in the last year or so (a day with only 10 would be unusual for him). I find it useful and interesting to follow him. I am both ongenetics and smount (I didn't mean to switch but I changed computers and forgot to switch).

I find it useful to refer to prokaryotes (and to fish). Jonathan says "grouping together bacteria/archaea is inappropriate; I note in my evolution textbook we use "bacteria & archaea" a lot". Wouldn't it be simpler if he just used "prokaryote."? I'm looking for advice here.

Thanks,

Steve

I received a thoughtful reply from Chuck Delwiche:

Well, I'm basically with Jonathan on this, although I think I'm slightly more moderate. "Fish," "prokaryotes," "reptiles," "dicots," etc. are really form-classes -- they describe the appearance of the organism, but not its evolutionary relationships. Naming paraphyletic groups is somewhat less objectionable than naming grossly polyphyletic ones, so I don't object to naming the North American Drosophila in a way that ignores the Hawaiian species that are derived from within it (this an example of your C/G case). But it really is confusing to refer to prokaryotes. Although they have coupled transcription and translation, the are other aspects of DNA replication, transcription, and translation that show striking similarities between Archaea and Eukarya. If you talk about "prokaryotes" as if the term represented a lineage rather than a morphology then it tends to obscure both diversity within them similarities between Archaea and Eukarya.

The reason this is important is that hides the predictive value that a natural classification can provide. Within your group G there would be some taxa that are more closely related to C than others, and they will share properties with C despite the long branch and loss of characters you describe. If you treat "fish" as a group it is confusing that Teleosts have immune systems that more nearly resemble those of tetrapods than do those of lampreys or hagfish. I don't know anything about lung- or lobe-finned fish immunology, but I'll bet they are even more tetrapod-like than those of Teleosts. Much the same statements could be made for skeletal structure, tooth anatomy, ventilation mechanisms, and I don't know what all else.

This is why we Must Never Speak of Fish Again.

Chuck

This is pretty much what I expected him to say, but there are two things I'd like to note. The first is this "Within your group G there would be some taxa that are more closely related to C than others, and they will share properties with C despite the long branch and loss of characters you describe. If you treat "fish" as a group it is confusing that Teleosts have immune systems that more nearly resemble those of tetrapods than do those of lampreys or hagfish." This is a very good point that anyone who refers to paraphyletic groups must bear in mind.

The second thing that struck me is that he wrote "'Fish,' 'prokaryotes,' 'reptiles,' 'dicots,' etc. are really form-classes -- they describe the appearance of the organism, but not its evolutionary relationships" despite the fact that I had provided a rigorous definition in terms of evolutionary relationships. Defining "fish" as vertebrates other than tetrapods makes it something other than a form class, and also eliminates any confusion about whales. Defining prokaryotes as organisms other than eukaryotes makes saying "prokaryotes" synonymous with saying "bacteria and archaea." It strikes me that this is a natural group in the sense that if a new domain of life were to be discovered (perhaps on Mars, or deep within the earth) with coupled transcription and translation and no nucleus or mitochondrion, people would want to group it with bacterial and archaea, even if phylogenetic analysis showed that it shared a most recent common ancestor with the eukaryotic nucleus.

In summary, I fully support the definition of taxa as monophyletic groups, and I would like to see them used to more rigorously define paraphyletic groups. Scientists will continue to refer to paraphyletic groups, and for good reasons. When they do, it would be useful if those groups were understood to be the relative complements of monophyletic taxa rather than informal categories, form classes or sloppy and unscientific categories.

I will continue to speak of fish. When I do, I will be referring to vertebrates that are not tetrapods. While I respect and understand colleagues who will never speak of fish, they must understand that this group is well-defined in terms of groups they recognize. I hope that all scientists move towards a more precise taxonomic basis for the groups that they will continue to talk about.

----------------------------------------
I thank Jonathan Eisen, Chuck Delwiche and Charlie Mitter for their contributions to this post. It goes without saying that the opinions expressed are, however, mine. The complete email thread (with Delwiche and Mitter) is available here.

----------------------------------------
Postscript (June 13).
Charlie Mitter forwarded Farris 1979 (Systematic Zoology 28:483-519), which describes the state of systematics at that time as a debate between pheneticists, phylogeneticists and evolutionists about the principles that should underlie a general reference system for biology. I believe that this debate has been fully resolved in favor of the phylogeneticists, and I am fully persuaded that the business of systematics is the definition of monophyletic groups. My points here are that 1) biologists sometimes have good reasons to refer to paraphyletic groups and 2) when they do, it is better, where possible, to understand those groups in terms of monophyletic groups. It is precisely because I agree with the arguments of Farris in favor of phylogenetics that I think that the paraphyletic groups to which scientists will inevitably refer should be defined in terms of phylogenetic taxa (clades) and not thought of as elemental taxonomic units.

Does the rs7901695 C variant predispose to diabetes by creating a cryptic exon?

2009-01-12T23:00:00.016-05:00

The discovery of disease-SNP associations through genome-wide association studies continues at a remarkable pace, but a recent review of common variants implicated in type 2 diabetes (T2D) suggests that, at least for this disease, current methods are unlikely to find many additional susceptibility loci (Prokopenko et al. 2008). We are now at the stage where "additional investigation is needed to define the causal variants, ... to understand disease mechanisms and to effect clinical translation." I continue to be interested in the (often underappreciated) contribution of pre-mRNA splicing to variation in gene activity, and I was especially intrigued by the statement that the variant with greatest effect size (the rs7901695 C variant in TCF7L2) lies in an intron but its mechanism of action is not understood. In order to investigate this I submitted the sequence surrounding this variant to SplicePort, our splice site predictor and analysis tool (see Dogan et al. 2007). Sure enough, the rs7901695 C variant alters the predicted strength of nearby splice sites. Because this sequence is deep in an intron, these would be potential cryptic splice sites, sites at which splicing occurs only in the case of a mutation.

Most striking is the activation of a splice acceptor site 68 nucleotides upstream of the variant SNP (position 688 in the submitted sequence or 114,744,012 on chromosome 10). The SplicePort score, which is -0.41 for the T allele, but -0.02 for the C allele, can be understood by noting that while 95.66% of splice acceptors have a score greater than -0.41, 89.01% of splice acceptors score above -0.02. Thus, the C allele acceptor site, although still relatively weak, is clearly better and well within the range of variation for real splice sites (99% of acceptors score above -0.86 and the median score is 0.923).

How might a C to T change affect an acceptor splice site upstream? Spliceport provides a feature browser that lists the features used for scoring any site. In this case, the following "downstream features" contribute to the score of the C variant but not the T variant: cgg (0.112), ctac (0.083), cg (0.072), ctacg (0.06), tacg (0.059), acg (0.043) and acggg (0.035). An independent approach, ESEfinder, similarly identifies this sequence context (CTACGGG but not CTATGGG) as an exonic splicing enhancer potentially recognized by ASF/SF2, SRp40 or SRp55. Thus, the rs7901695 C variant might activate the upstream acceptor site by functioning as part of an exonic splicing enhancer that is activated by one or more SR proteins.

The next question is how the activation of an acceptor splice site deep within an intron would affect gene expression. A splice site that is activated by mutation is known as a cryptic splice site, and an exon that is used only in mutant alleles is referred to as a cryptic exon. Intron mutations that affect gene expression by creating cryptic exons have been known for some time. In fact, I wrote a commentary on several such mutations in the human beta-globin gene over 25 years ago while still a graduate student ("Lessons from mutant globins," Mount and Steitz 1983). Mutations that activate cryptic exons are often overlooked because they lie away from splice sites, and because the resulting RNA is often unstable due to nonsense mediated decay. Nevertheless, there are now hundreds of papers describing such mutations. The case here is especially tricky because the SNP does not directly create a cyptic splice site, but may activate one at a distance.

Thus, activation of a cryptic exon is a reasonable hypothesis for the effect of the rs7901695 variant on TCF7L2. In this model transcripts from the C allele are more likely than transcripts from the T allele to be aberrantly spliced and ultimately degraded. The lack of EST data supporting a cryptic exon in this region can be explained by nonsense-mediated decay. This proposed mechanism is similar to regulated unproductive splicing and translation ("Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans." Lewis et al. 2003), an important difference being that cryptic exons are generated by mutation rather than being regulated alternative exons.

How likely is this hypothesis? I could not find any papers that have investigated the effect of this variant on splicing. Clearly, the next step is to look for evidence of the cyptic exon and verify that the C variant does indeed introduce an exonic splicing enhancer. There is also the possibility that other SNPs associated with the risk variant haplotype ("HapB_T2D"), particularly rs7903146, are more likely to be causative (Helgason et al. 2007). I could not find a direct comparison of the relative risk for these two variants, but it's possible that association data alone will rule out rs7901695, or even that they already have. Colleagues have suggested that I pursue this in my own lab, but I work other things, and there are people in the diabetes field that can do this quickly. I only ask that they cite this post (here's how).

Although additional investigation is needed, the rs7901695 variant is certainly capable of explaining an effect on the expression of TCF7L2 through activation of a cryptic exon. This case is an example of how SplicePort can be used to evaluate the potential of variants to alter splicing. We plan to systematically evaluate the possible effect of all human SNPs on splicing. In the meantime, I strongly encourage investigators to use SplicePort to evaluate variants of interest on their own.

Remembering C.C. Tan

2008-11-08T13:02:00.009-05:00

I read this morning (link) that Tan Jiazhen, better known in the U.S. as C. C. Tan, passed away Nov. 1, at age 99. I suspect that his influence on genetics probably much greater than most Americans appreciate. He worked with the first generation of Drosophila geneticists, and he was Dobzhansky's firstPh.D. student at Cal Tech, yet his career extends into the modern era, and many of the young Chinese scientists coming to the United States now have met him. It's impossible for me to evaluate how much he is responsible for the intellectual "silk road" that contributes so much vitality to twenty first century genetics, but I suspect that without C.C. Tan it would be much less traveled. Interested readers should consult Jim Crow's commentary in genetics (Vol. 164, pg. 1 *) to see how he managed to bring Chinese genetics into the modern era, past the Lysenko years and the Cultural Revolution.

* This page, like most at genetics.org, does not load properly in Firefox on Windows. I'm sure that the GSA will fix that. For now, I just use another browser when I visit the GSA.

Do I have the right to know my own genetic makeup?

2008-07-13T22:40:00.007-04:00

I took a little time yesterday (July 12) to attend a panel discussion on direct to consumer genetic testing at the Genetic Alliance annual conference. Panelists were Sue Friedman, from FORCE; Trish Brown, from DNA Direct; Joanna Mountain, from 23andMe and Sean George, from Navigenics. Francis Collins, Director of the National Human Genome Research Institute, moderated. Once each of the panelists had made their opening remarks, Collins started the discussion by asking why, if personalized genetics is so wonderful, the states of California and New York have issued cease and desist orders to several personalized genetics companies (story). The response was conciliatory, echoing statements on the 23andMe blog (the spittoon):

We agree that this evolving field of personal genomics is in need of proper regulatory oversight. While our mission to provide accurate and contextual information to our customers about their genetic information is aligned with the regulatory mandate to protect the public health, we also want to ensure that efforts to rein in our industry do not hamper the potential benefit of genetic knowledge to our health.

Many relevant issues were brought up, by panel members, or by those in the audience. Can people deal appropriately with uncertainty? Do they understand the relationship between genotype, the environment and phenotype? What about genetic information with predictive value, but about which the consumer can do nothing? The case of APOE was discussed at length.

I do feel that we have a right to obtain information about our own genetic makeup without having to justify ourselves, to a physician, to an insurance company, or to the state of California. I am also skeptical of the perception that "most people are incapable of grasping the relevance of provisional, statistical information." In any case, an enterprise that feeds users with 500,000 bits of information, most of which have no significance, seems more likely to help people understand that genotype is not fate than to have the opposite effect.

Giving people genetic information can be separated from giving them advice, and it seems to me that providing information about genotype should be regulated only to the extent that technical standards are met. This is analogous to surveyors giving me information about the elevation of my house. That information, by itself, is not advice about flood risk, and I would be surprised if surveyors were required to provide accurate assessments of that risk in order to operate, or forbidden from providing consumers with data that a third party judged to be of little value.

The panel helped me to understand the risk of consumer fraud, but, ultimately, I feel, strongly, that I have a right to know my own genetic makeup. Furthermore, I find it insulting to say that consumers are incapable of understanding uncertainty. There is certainly room for regulation, but I hope that my right to pay someone to tell me about my own genes is not infringed. Perhaps it is most important to prevent companies from taking money for tests without providing portable genotype data whose implications can be evaluated by a third party in the light of new information, which could be information about the implications of that specific information, other genetic information that might influence how it is interpreted, or information about the interactions between that bit of genotype and other factors such as one's diet or medical history.

Links for this article:

NHGRI (National Human Genome Research Institute), with news, links to research, funding opportunities, fact sheets, and career opportunities, including such tidbits as a Catalog of genome-wide association studies.

"Should Personal Genomics Be Regulated" Tim O'Reilley's blog on the subject (with interesting comments and discussion).

DNA Direct. DNA Direct's services focus on personalized test result interpretation and supportive materials and services.

Genetic Alliance. The Genetic Alliance works to eliminate obstacles and limitations within the genetics community through novel partnerships among stakeholders and integration of individual, family, and community perspectives to improve health systems and inform decisions.

"Getting up close and personal with your genome," a news summary for the scientist written by Laura Bonetta and published in Cell (2008 May 30;133(5):753-6)

Plant genomes, animal genomes, more and more genomes!

2008-01-20T17:28:00.001-05:00

I recently returned from the Plant and Animal Genome Conference (XVI, Jan. 12-16, in San Diego). This conference is much more applied than what I'm used to, but I came because it seemed a good place to see comparative genomics in full bloom, and that turned out to be true. I was struck by the extent to which the meeting was a showcase for vendors (Agilent, Sequenom, BioTrove, Illumina, Roche (now incorporating 454 and Nimblegen), Affymetrix, Applied Biosystems, Keygene, etc.), many of whom literally wined and dined conferees at their workshops.

However, I was also struck by the extent to which new high-throughput sequencing technologies are already in widespread use. Ronan O'Malley (Ecker lab) described the sequencing of Cvi, a strain of Arabidopsis distinct from the Columbia accession already determined; in the process he compared 454 and Solexa sequencing. Steve Jacobson (UCLA) described the repeated re-sequencing of (bisulfite-modified) Columbia for the purpose of studying cytosine methylation. Several more plant genomes are in in the pipeline, and a sense of the pace is conveyed by the fact that plenary speaker Eddy Rubin (JGI) "announced" the completion of the soybean genome almost in passing.

Other plenary talks were uniformly excellent. I missed the initial talk, by Jerry Caulder, which was apparently quite controversial. David Baulcombe referred to it by saying that the European perspective on genetically modified foods is different and that "by shying away from the hazards we don't gain credibility." Another notable aside was Michael Ashburner's statement that "there is no point in funding biomedical research unless you also fund informatics."

This week, it's ancestry

2007-11-25T19:45:00.000-05:00

Last weekend there was a lot of buzz about personal genomics (see Genome Technology Daily, "It was a Helluva Weekend for Personal Genomics"; or Eye on DNA, "DNA Network Members Discuss Personal Genomics Service Providers 23andMe, deCODEme, and Navigenics"; or my previous post). This weekend, it's ancestry. Today's papers had two interesting features on ancestry testing, both of which nicely echoed my own post about caution regarding ancestry testing ("On Genes"). First, the New York Times business section ("DNA Tests Find Branches but Few Roots") discusses the business of ancestry testing. The article is nice in that it compares the cost of ancestry testing by various companies, shows that results differ, and quotes Henry Louis Gates Jr. making reasoned assessments of the role that DNA testing can play. Second, the Washington Post reviews "The Genetic Strand: Exploring a Family History Through DNA" by Edward Ball("Blue Blood, Black Genes").

The theme is clear. You can only learn so much about your ancestors from DNA.

Ready or not, personalized genetics is here.

2007-11-17T09:56:00.001-05:00

Yesterday's announcement by deCODE genetics that they would be launching a personalized genetics service, deCODEme (news release), means that a major player in gene discovery has just joined the growing field of companies offering personalized genetic services. As I wrote in my Nature Network blog, "On Genes" in "The Scientist Blogger and the Personal Genome," information about susceptibility to disease, potential for health or accomplishment and responsiveness to therapies is found in our genes, and it is going to be made available to people who want it. A lot of people are going to want it. Most are not going to be prepared to understand it. Even Jim Watson and J. Craig Venter aren't entirely sure what to make of their genomes. Genetic counseling may morph into a profession that serves everyone, not just those who faced with clear cases of genetic disease.

Journalists and scientists also have a role to play. Let me highlight three useful responses.

The New York Times has an excellent series called "The DNA age." These articles (all by Amy Harmon, at least so far), "explore the impact of new genetic technology on American life." One published today, "My Genome, Myself: Seeking Clues in DNA" describes her use of the 23andMe service.

Bertalan Meskó, a blogger at "ScienceRoll," presents coverage of Personalized Medicine, including a summary of breaking news (today) and a review of services offered by Navigenics, 23andMe and Helix Health (last week, before the deCODE announcement).

I have started "Information on Genes," (ongenes), a web site that is intended to be a place where answers to questions on genes, genetics and genomics are provided by experts in the field. Questions will be posted anonymously but answers will not. I plan to solicit answers from people in the know. My hope is that ongenes will provide useful information to anyone trying to understand genetic tests, including professionals in the field.

Simons Foundation funds research on sporadic autism mutations

2007-09-06T12:57:00.000-04:00

Because I've dealt with the issue of sporadic autism linked to paternal age before (links) it seems worthwhile noting here that the Rutgers University Cell and DNA Repository will use a $7.8 million grant from the Simons Foundation to establish a new collection of DNA samples to help autism researchers study sporadic germ-line mutations. This story is covered by GenomeWeb today.

'On Genes," my blog on Nature Network

2007-08-28T23:33:00.000-04:00

After commenting on Nature Network ("What's up with Nature?"), I ended up creating a new blog over there. It's "On Genes," and the URL is network.nature.com/blogs/user/smount. It's not clear what I'll put there as opposed to here. Perhaps one of the two blogs will die. Right now, the plan is to put more substantial scientific posts here and more news-oriented posts there.

Along those lines, my first real post on the Nature Network blog, "PRISM distorts our view of the open access debate" was in response to Jonathan Eisen's blog entry “PRISM – Partnership for Research Integrity in Science and Medicine – Seems like a spoof but it is real, and sad“). It makes me angry to see issues that concern me be taken up by a public relations firm that is so thoroughly dishonest. But I won't repeat that here. You can read about it there.

Plants, Animals and the Ancient RNA Toolkit

2007-07-29T15:44:00.000-04:00

Multicellularity has arisen independently several times, but most famously twice, in the two lineages giving rise to plants and animals. In fact, the last unicellular ancestors of these two lineages were not particularly closely related, and the last common ancestor of both plants and animals also gave rise to an enormous number of extant unicellular progeny, including all of the fungi. When I began serious work on the regulation of pre-mRNA splicing in plants in 2001 I did so with an awareness of how very similar the process is to pre-mRNA splicing in animals. This is all the more striking because so many species have lost this complexity. In fact, plants and animals share many processes that must have been present in the last common ancestor, but have been lost in many unicellular eukaryotes derived from that same ancestory. RNA figures heavily in the list, which includes microRNAs, U12 introns, the exon junction complex and complex alternative splicing.

Although the last common ancestor of plants and animals was almost certainly much more complex than most modern unicellular eukaryotes (at least in terms of its genome), it was probably not multicellular. The signals that control development in animals (wnts, hedgehog, FGFs, TGF-betas, etc.) are completely missing in plants. Likewise, the genes involved in meristem maintenance, ethylene-signaling, auxin-signaling and so on are missing in animals. It's also worth pointing out that the opisthokont clade (which includes animals and the fungi) is well-established (see the figure, which is from the Tree of Life Web Project).

Perhaps most convincing are the exceptions: the processes shared by animals and plants but missing from most unicellular eukaryotes are not missing from all. U12 introns were recently found in distantly related protists and in a fungus (see my comment). MicroRNAs were recently described in Chlamydomonas reinhardtii, a unicellular green alga (Zhao et al., 2007). There is even a miRNA family that is appears to be conserved between plants and animals and targets a homologous family of splicing regulators (Arteaga-Vazquez et al. 2006).

It is therefore frustrating to read commentaries that are written as though genomic complexity is new. For example, Ram and Ast (2007) mistakenly generalize from S. cerevisiae to S. pombe (which retains more genomic complexity of several sorts, including alternative splicing) and talk about "before and after" incorrectly. Their conclusion, that "SR proteins had already facilitated the splicing of weak introns before the evolution of alternative splicing" may be correct, but complex alternative splicing was almost certainly present in the last common ancestor of plants and animals. I say this based on the fact that it had many genes whose products function in the regulation of alternative splicing, and which have been lost in unicellular descendants lacking complex alternative splicing (among these is a repertoire of at least four SR proteins).

What is most interesting to me is the correlation between developmental complexity and retention of genomic complexity, including alternative splicing and miRNAs. It might not have evolved with multicellularity, but the ancient RNA toolkit might be very useful when it comes to building a complex organism.

What's up with Nature? Nature network, screwy renewals, more.

2007-07-21T11:32:00.000-04:00

Nature (Nature Publishing Group, to be precise) has been aggressively embracing the internet in new and interesting ways. Their main page at nature.com no longer has a list of journals. Instead, journals is just one of many choices (it's at the top, to be sure, and they now have no less than 77), including podcasts, gateways (which aggregate related content across journals), feeds, blogs, jobs, society partners, conferences, regional websites and miscellany (which is creatively titled "launch pad"). It's all a bit unfocused, but much of it is very useful. For example, Connotea is the shared bookmarks site that I settled on after some deliberation and experimentation, and that decision reflects the quality of the site. Nature provides useful tools oriented towards literature citations, the most important of which allow easy capture of bibliographic information (with one annoying bug that involves authors with multiple names) and export of libraries.

Nature's newest venture is "Nature Network," which is social networking for the scientist. I quickly found and joined groups for people working on bioinformatics, Drosophila and Arabidopsis. I didn't find a group working on splicing, so I created one. Nature Network could be quite useful, but I wonder if it is going to succeed. To do so, it must "catch on," a phenomenon that his hard to predict and depends very much on the site providing useful tools not available elsewhere. Right now, most of what it offers seems redundant, but at the very least it provides a professional alternative to Yahoo groups for ad hoc groups of scientists who want to create an online forum for exchange on a particular topic. One especially interesting choice is the elevation of London and Boston to a special status. I'm sure that Nature Network San Francisco will come soon, but I see lots of problems with this. Would the East Bay have their own Nature Network? I can't wait to find out if New York or Washington will be added first. Where will it end? Recalling the desperate enthusiasm with which I have often seen local politicians embrace biotechnology, I fear that this could get competitive and ugly, even before Nature Network runs out of space on their local menu toolbar.

Of course, the weekly journal is still the keystone of Nature Publishing Group. I have had a personal subscription for over 20 years and I read the journal, in print, every week, bringing it along with me to meals and whatnot. This year, they are being very aggressive about renewals and they're getting it very wrong. My annual renewal expires in September. About a month ago I received a phone call in my office, inviting me to renew. Yes, they called me. Promised a 30% discount, I did so. I renewed online in an attempt to be sure that I generated a renewal of my existing subscription, following the instructions of the person who called me. The result was an entirely new subscription, which expired not in September of 2008, but in July of 2008. I also found that I had three or four customer IDs associated with my account (for only two journals, the other being Nature Genetics). After many rounds of email with their customer service my subscriptions were simplified under a single subscriber number with the proper expiration date. I should emphasize that the replies were prompt, cordial and helpful; the problem is with their system. I assumed that everything was fine, despite the numerous entries on "/myaccount/show/subs," shown here for your amusement.

Then, I received two copies of Nature in the mail. Inquiry generated a response that came down to this:

So this is the reason you are receiving two copies of the same journal Nature but they are two different volumes and issues.so you will be receiving two copies of Nature till Sep 2007.

I decided to leave well enough alone.

Today, I received an email, from Sarah Greaves, PhD, Publisher, Nature, herself, that read in part

Your current subscription to Nature is now up for renewal. To ensure you don’t miss a single issue, I am pleased to offer you a 30% discount from our normal subscription rate.

This offer expires on SEPTEMBER 27th and is only available online through this email, so act now to ensure you don’t miss out on this exclusive rate.

I wanted to run screaming from the room, but I opted instead for writing this post.

Michael Crichton weighs in on patenting and the Genomic Research and Accessibility Act

2007-02-14T14:01:00.000-05:00

In yesterday's New York Times, Michael Crichton (author of "Jurassic Park" and "Next") wrote in favor of the Genomic Research and Accessibility Act, which would ban the patenting of genes found in nature. He correctly points out that genes are not inventions and attributes the fact that they can be patented to "a mistake by an underfinanced and understaffed government agency, The United States Patent Office." I note that the bill, as described by co-sponsor Xavier Becerra, is not retroactive, so, while it's a no-brainer that the patent office should not be granting patents for the discovery of natural phenomena, this bill won't do much to facilitate the promise of personalized medicine (because most of the genes that matter have already been patented). I have discussed the possibility that we might find relief in the courts before (regarding EBay Inc. vs. MercExchange, LLC and Labcorp vs. Metabolite Laboratories). Certainly, gene patents disserve the public interest, but that is not enough, and the prospect of understanding the law in these cases is daunting.

Science Blogs listed on the OMMBID blog

2006-12-25T14:40:00.000-05:00

The OMMBID (Online Metabolic and Molecular Bases of Inherited Diseases) blog has published a list of science blogs and related websites. It's nice to have been included. When I get some time I will browse that list and update my own lists of favorites on Connotea and my summary page.

Shared bookmarks for the literature -- what to do?

2006-07-11T11:24:00.000-04:00

After some initial skepticism, I agree that "social bookmarking" is nice. It's very useful to put bookmarks to the literature online and see what articles others have cited, and the old approach of journal-specific browsing (associated with hard-copy volumes, but also including eTOCs) is certainly outmoded. I'm experimenting now with del.icio.us (as both ongenetics and RNAinfo) and Connotea. An RNA Society survey generated a few votes for CiteUlike, which looks great, although it is not easy to get it to accept an article from PubMed (in order to get the correct URL active you have to select the article you want from a list; if it is only result of your search, PubMed continues to display the URL for the search). A user named Cortel has lots of relevant citations, so I may continue to keep track of him, whether I settle on CiteUlike or not. I also created my parallel blog, "Quick Notes on Genetics," primarily with the idea of citing articles, and I keep a list of especially relevant articles tied to my lab's web page (the Mount lab reading room). Finally, it's worth mentioning the Faculty of 1000 in this context. This is all way too much. It's not clear what I will settle on, but most of these will be forgotten once I stop exploring and develop a routine for finding and sharing the articles that interest me.

The genetic architecture of complex traits: significant differences can involve noncoding DNA and can be epistatic

2006-05-12T14:46:00.000-04:00

Clark et al., in "A distant upstream enhancer at the maize domestication gene tb1 has pleiotropic effects on plant and inflorescent architecture," describe a significant QTL residing in noncoding (and largely repetitive) DNA far upstream of the affected gene (PubMed, Nature Genetics). I am reminded of the graduate genetics course given by Michael Freeling that I sat in on while a postdoc at Berkeley (in 1984, while studying the effects of transposable element insertions on gene expression in Drosophila in Gerry Rubin's lab). He emphasized epigenetic phenomenon and truly expected that novel molecular mechanisms (such as transposon instability) would reveal "a molecular clock that really ticks." From his web page it looks like he's continuing on the same tack today.

A paper in the previous month's issue of Nature Genetics (Carlborg et al., "Epistasis and the release of genetic variation during long-term selection," PubMed, Nature Genetics) reported a genetic network of four interacting loci affecting chicken growth ("Growth4, Growth6 and Growth12 had a significantly larger effect on growth in homozygous Growth9 individuals than [others]"). This kind of genetic interaction is precisely what any developmental geneticist would expect, yet breeders and population geneticists often cling to simple linear models. This paper will certainly help, not least of all because it involves a method for the detection of epistatic QTLs.

From HapMap to selection map

2006-03-08T10:00:00.000-05:00

It was the article by Nicolas Wade in the New York Times ("Still Evolving, Human Genes Tell New Story") that alerted me to the new article in PLoS Biology by Voight et al. ("A Map of Recent Positive Selection in the Human Genome", from Jonathan Pritchard's group at the University of Chicago). I've been anticipating a list of human genes under selection for some time, and it's exciting to see this published. This paper, perhaps more than any other, marks the transition to a new and controversial era in genetics. On the positive side, we're going to learn a lot very quickly about the genetics of human differences. This will provide many benefits and engage curiosity in satisfying and useful ways.

On the other hand, the uncritical acceptance of results that are statistical in nature (and have a real possibility of being wrong) is disturbing. A recent visitor to Sarah Tishkoff's lab (Jeff Jensen, from Cornell, where he works with Aquadro and Bustamonte) gave a talk about the statistical problem of distinguishing selection from certain demographic phenomena that made me think the interpretation of selection maps is going to be extremely uncertain. It is surprising that none of those issues were addressed in Wade's article, especially so because the New York Times typically fills their science articles with quotes from others in the field. I felt the same unease a few weeks ago when watching a PBS documentary "African American Lives," in which famous African-Americans were given overly specific information about their ancestry without appropriate statistical disclaimers.

I suppose that we will all be talking a lot more about selection and race with my friends who are not geneticists, and putting a lot more population genetics into my graduate genetics course. Clearly, the idea that population genetics is passé is now passé.

Alternative splicing and host defense in flies and plants

2005-08-24T23:41:00.000-04:00

In an article appearing online in Science this week, and discussed in The Scientist, Watson et al. (PubMed) implicate the Drosophila Dscam gene in host defense. They detect secreted forms of the protein in hemolymph and show that the gene enhances phagocytosis of bacteria by hemocytes. They also demonstrate conservation of the potential for extreme isoform diversity across insect taxa, an extension of earlier work from the Graveley lab (Graveley et al. 2004; PubMed, RNA journal). Isoform diversity due to alternative splicing is therefore implicated in the generation of adaptive variation in host defense molecules. It is interesting that isoform diversity due to alternative splicing of Toll-like proteins has likewise been implicated in plant defense (reviewed by Kazan 2003 and Jordan et al. 2002; an example is Zhang & Gassmann 2003).

What kind of adaptation does this make possible? Certainly, extreme variability allows rapid adaptation on a population level. Furthermore, the presence of membrane bound and secreted forms of the same molecule presents the possibility of adaptive immunity through clonal selection of hemocytes that see antigen. Louisa Wu pointed me to an article in Nature Immunity (Little, Hultmark and Read 2005) making the point that neither memory nor specificity has been ruled out in invertebrate immunity. True adaptive immunity in insects would be very exciting, but we're a long way from that. How could variation in isoform production among hemocytes in Dscam isoforms be heritable? Through epigenetic silencing of splicing factors? We're just at the beginning of this story.

The authors say this:

broad conservation of receptor diversity strongly suggests important
functions and future studies will have to further address
whether the presence of diverse immune receptors in
invertebrates increases the effectiveness of immune responses
of individual animals. Alternatively, given the relative short
life span of many invertebrates, it may be that immune
receptor diversity is less important ontogenetically but rather
enhances the adaptive potential of animal populations to
changing environmental and pathogenic threats.