Sunday, August 10, 2014

Nicolas Wade’s troubling ideas

Among the popular myths about human genetics left over from the era of eugenics, social Darwinism and racism, two are especially relevant to Nicolas Wade’s recent book, “A Troublesome Inheritance: Genes, Race and Human History.”  The first is that natural selection has stopped due to advances in health and medicine, and that, as a result, the unfit are now contributing more to each succeeding generation. Early in his Book, Wade disagrees, stating that “human evolution has been recent, copious and regional”, and much of the first part of the book is devoted to this claim. I think this statement is well-supported by modern genetics. Wade goes further, arguing that in fact, selection favors those who are economically successful. Here, demography and historical records have more to say than genetics, and Wade relies heavily on the work of Gregory Clark, an economic historian at the University of California, Davis, especially the book “A Farewell to Alms” which he reviewed favorably for the New York Times in 2007. I am skeptical about the connection between affluence and Darwinian fitness; I don’t think there are genetic data either way.


Wade gets into trouble when he tries to find support in modern human genetics for a second major myth, which is that humanity can be meaningfully divided into a small number of types (races), and that these types have biologically meaningful differences in things such as intelligence and moral character. Virtually all practicing human population geneticists, including those whose work he cites, are in agreement that this speculation is unsupported, and today’s New York Times carries a succinct statement signed by many of them, featuring a simple message:


We are in full agreement that there is no support from the field of population genetics for Wade’s conjectures.


The letter is here.  The list of signatories, here, contains 139 names, including every prominent human geneticist that I thought to look for.


Why the outcry? People who devote their scientific lives to the study of human genetic variation think about race and popular misconceptions all of the time. They care that their work is accurately represented.


For those who wish to read a more detailed rebuttal of Wade’s arguments, I recommend Jeremy Yoder in the Los Angeles Review of Books, but there are many other good ones. 
The original New York Times book review, by David Dobbs, is here.


For those who want to read less, I leave you with one very brief quote.

He’s claiming to be a spokesperson for the science and, no, he’s not.
- Sarah Tishkoff (David and Lyn Silfen University Professor in the Departments of Genetics and Biology at the Universisty of Pennsylvania, quoted in a Nature News Blog)

----------------------------------------
Postscript (additional commentary):
- Nicolas Wade's reply (New York Times, Aug. 22)
- Marcus W. Feldman in the Computational, Evolutionary and Human Genomics at Stanford blog.
"Echoes of the Past: Hereditarianism and A Troublesome Inheritance" Marcus W. Feldman is the Burnet C. and Mildred Finley Wohlford Professor in the School of Humanities and Sciences at Stanford and a Founding Director of CEHG.

Friday, January 03, 2014

What is a gene?

A gene is all of the DNA elements required in cis for the properly regulated production of a set of RNAs whose sequences overlap in the genome.   
I formulated that definition c. 1990, when I started teaching genetics to graduate students. I think that the course I actually taught was quite different from the plans leading to that formulation, but I remember sitting for several hours in a coffee shop in Newark airport and coming up that definition. This was after the discovery of splicing, transposable elements, remote enhancers, overlapping genes, nested genes, long noncoding RNAs and many short noncoding RNAs, and I imagined discussing literature on each of these topics and its implications for how a gene might be defined. 1990 was before “tweet-length” could be applied, before the discovery of microRNAs and (most significantly) before complete genome sequences and high-throughput data in the style of ENCODE.


I believe this definition has stood the test of time, and that it will continue to provide a useful understanding of what is meant by a gene. 

The fact that it was written to accommodate work that predates complete genome sequences, ChIPseq and whatever methods are developed in the coming years, should be kept in mind as we face hype about new discoveries changing our view of the gene. I predict that later this year some new work will be described as overturning the idea of junk DNA, or the idea of genes as beads on a string, or the notion that genes are merely their coding information, or perhaps all of these. These discoveries will be said to account for the dark matter of the genome and other deep mysteries that were unsolved until now. Faced with that hype, I will link to this post.

In 2014, as part of my plan to write more but shorter posts, I will also report the history of my own understanding of several of the issues that make defining “a gene” problematic.
--------------------
Mark Gerstein almost immediately pointed out that he had published a very similar definition in 2007:

The gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.
See PubMed: Pubmed ID 17567988 or 
Gerstein lab: http://archive.gersteinlab.org/papers/e-print/grgenerev/preprint.pdf or 
Genome Biology http://genome.cshlp.org/content/17/6/669.long

Thursday, January 02, 2014

Michael Pollan on plant behavior, good and bad

A friend asked my view, so I read the recent article by Michael Pollan in the New Yorker, "The Intelligent Plant."

Michael Pollan is a very good writer and he picked an interesting topic. Plant behavior is indeed fascinating and he does a good job of fascinating his readers without obviously going far beyond what can be supported. I also think he does justice to the community of plant biologists by presenting people's views in their own words. However, I fear that he may have incited enthusiasm for bad science. A critical point in the article occurs when he points out that the argument is about language.
Many of the scientists in [Gagliano's] audience were just getting used to the ideas of plant “behavior” and “memory” (terms that even Fred Sack said he was willing to accept); using words like “learning” and “intelligence” in plants struck them, in Sack’s words, as “inappropriate” and “just weird.” When I described the experiment to Lincoln Taiz, he suggested the words “habituation” or “desensitization” would be more appropriate than “learning.” Gagliano said that her mimosa paper had been rejected by ten journals: “None of the reviewers had problems with the data.” Instead, they balked at the language she used to describe the data. But she didn’t want to change it. “Unless we use the same language to describe the same behavior”—exhibited by plants and animals—“we can’t compare it,” she said.
I agree that unless we should use the same language to describe the same behavior, and applying the words 'behavior' and 'learning' to plants make sense to me. That we use these terms (appropriately, I think) for robots and computers points out that they are neutral with respect to mechanism. However, I don't think that 'intelligence' or 'consciousness' would be appropriate for anything described in this article. The prefix 'neuro' refers to neurons or the nervous system and we know for a fact that plants have nothing like neurons. It's pretty clear that multicellularity evolved independently in plants and animals, and there are important differences, so I find it highly unlikely that plant and animal behavior shares underlying mechanisms. Thus I very much doubt that there is “some unifying mechanism across living systems that can process information and learn.” While fundamental processes common to all life are no doubt shared, more sophisticated signaling is unlikely to be the same. Cell walls make it hard to see how information could be possibly be transmitted through synapses, which are specialized points of contact between neurons. On the other hand, plasmodesmata, channels that allow direct but reguated transport between cells, provide plant cells with the potential for mechanisms unavailable to animal cells. Thus, while communication between the parts of a plant is likely to be as sophisticated, if not more sophisticated, than comparable mechanisms in animals, it is very different, and much less well understood. We would do better to appreciate plants on their own terms. I hope that this article leads more young people into the exciting field of plant signaling. I fear that it may do so for the wrong reasons.


Time-Lapse HD Plants following light

Links:

The Intelligent Plant,” by Michael Pollan in the New Yorker. Dec. 23, 2013. Cleve Backster, an obituary in the New York Times Magazine. The best-selling book, “The Secret Life of Plants,” was inspired by Backster’s research.

Saturday, September 08, 2012

ENCODE: Data, Junk and Hype

This week saw the publication of dozens of papers in Nature, Science and Genome Research that report an initial analysis of data from the Encyclopedia of DNA Elements (ENCODE) project on RNA, transcription initiation, transcription factor association, chromatin structure and histone modification.  The scale of this data is staggering, and it will change how human molecular genetics is done.  Imagine how the field of climatology would be changed if they suddenly had hundreds of years of complete weather data from thousands of weather stations.  This is comparable.
ENCODE data, visualized with the UCSC genome browser.
What ENCODE does not do is fundamentally change our view of what the genome looks like.

The third and fourth sentences of the main article in Nature are these:
These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation.
This "result" has been emphasized in the popular press.


Hype: This lead article in Thursday's copy of the Washington Post Express (a publication of the Washington Post distributed on DC's Metro) is typical of how the story was covered. 
In particular, the conclusion that this study "overturns theory of 'junk DNA' in the genome," which was the title of the article in The Guardian and which was echoed by many who should know better (e.g. Science) is, well, junk. What the ENCODE project has done is locate the sites on human DNA that are represented in RNA, and the sites at which numerous factors bind.  Because 80% of the genome has some biochemical "function" of this sort does not mean that 80% of the genome has some effect on gene expression (although these data will help us immensely in the task of figuring out which noncoding nucleotides do indeed affect gene expression), and we can still be quite sure that most of that 80% does not have any biological function in the usual sense of the word, which is that if you delete it or alter it, something that matters biologically or medically will change.  We still know that most of the millions of single nucleotide polymorphisms that distinguish any two copies of the genome don't matter very much.  It is simply not the case that the vast majority of the human genome has some (biological) functional importance.

Conversely, we have known for a long time that a lot of noncoding DNA does have a function.  Most of the sequence that does matter is not coding.  One measure of that is conservation, and the earliest complete mammalian genomes, in 2005, showed that about 5.3% is conserved among mammals (vs. only about 1% that is coding).  A direct attempt to use ENCODE (and 1000 genomes) data to estimate the fraction of the genome under purifying selection (Ward and Kellis, this week) finds "an additional 4% of the human genome subject to lineage-specific constraints."  While this is a big increase in the estimated fraction of the genome subject to purifying selection, the total is still only about 10%, leaving 90% as neutral.

We have also known for a long time that most RNA transcripts do not result in cytoplasmic messenger RNAs (Salditt-Georgieff and Darnell JE Jr. publised a paper in 1981 with the title "Further evidence that the majority of primary nuclear RNA transcripts in mammalian cells do not contribute to mRNA.") and specific transcripts in noncoding regions were described by the end of the 1980s.

The science blogosphere has been aflame for the last two days as scientists attempt to debunk this hype.  Those bloggers (many of whom are authors on the ENCODE papers) have provided excellent summaries of the issues surrounding the notion of junk DNA.  I have bookmarked several on delicious (tag: ongenetics/ENCODE) and some (mostly the same ones) are listed below.

To my mind, the biggest problem is that what is not news (that not all noncoding DNA is junk) has been allowed to eclipse what is news (that we have a vast trove of data that allows us to assess possible functions for all nucleotides).

Links:
http://genome.ucsc.edu/ENCODE/
The gateway to ENCODE data (through the UC Santa Cruz genome browser)

http://www.genome.gov/10005107
The ENCODE project web site.

http://www.nature.com/encode/
This is Nature's gateway to the literature.  It's a little (OK, a lot) gimmicky, so you probably want to just visit the tables of contents: Nature, Science, Genome Research.

The Finch and the Pea: ENCODE Media Fail
This blog post by Mike White is a survey of media hype documenting numerous errors resulting from the hype (or misplaced focus).

Encode (2012) vs. Comings (1972)
This blog post by T. Ryan Gregory presents a serious review of the concept of "junk DNA."

ENCODE: My [Ewan Birney's] Own Thoughts
Ewan Birney on his own blog.

A Neutral Theory of Molecular Function
This blog post by Michael Eisen "wrestles" with the idea of junk DNA.
I want to end by pointing out that there are lots of people (me and my group included) who have already been wrestling with this issue, with lots of interesting ideas and results already out there. From an intellectual standpoint I’d like to particularly point out the influence the writings of Mike Lynch have had on me – see especially this.
ENCODE: The Rough Guide to the Human Genome
Ed Yong's post (at Discover Magazine), has been revised in the last day or so to be more cautious about the hype.

Cryptogenomicon: ENCODE says what?
This post by Sean Eddy makes the points that "The human genome has a lot of junk DNA," that "Noncoding DNA is part junk, part regulatory, part unknown," that "ENCODE’s definition of 'functional' includes junk" and that "Evolution works on junk."  His post has dozens of comments, mostly from experts in the field.

Finally, a few screen shots from Twitter in the last few days:
Reaction to ENCODE media hype on Twitter ranged from blind propagation to harsh criticism.

Saturday, February 19, 2011

Genetic Genealogy and the Single Segment

Last year, my wife Janet and I sent our DNA off to 23andMe for analysis. Among the tools that they provide is a "Relative Finder," which lists other people on the site who share regions of DNA that appear to be identical by descent. In my case, there are 476 people listed, each sharing between 0.07% and 0.46% of my genome, almost always as a single segment (there are 18 people with whom I share two segments). These people are generally anonymous, but you have an opportunity to make contact and invite them to "share genomes," which means only that you can see which regions are shared.

There are a lot of people on 23andMe who are quite interested in this tool, and who use it for genetic genealogy. Many of these same people also use Family Tree DNA and ancestry.com. As a result of my interactions with these 23andMe relatives, and following the discussions on the 23andMe community forums, I have been thinking about, and researching, what it means to share one segment of DNA by descent with someone. In the process, I have realized some things that are not fully appreciated by most of the genealogy buffs on 23andMe.

I am presenting these insights here, and will consider them one at a time.
  • Distant relatives often share no genetic material at all.
  • It is possible to share a segment with very distant relatives.
  • Sometimes, more distant relationships are more likely.
  • Most of your relatives may be descended from a small fraction of your ancestors.
Distant relatives (fourth cousins and beyond) often share no genetic material.
The chances of not sharing any DNA at all becomes appreciable with fourth cousins and rises to approximately half with fifth cousins. This is based on my own simplified calculations and those of Donnelly (1983), who opines that "proof of descent from William Shakespeare does little to increase the probability that the claimant has genes in common with him." There are limits to what can be accomplished by genetic genealogy that are imposed by the real chance that you simply do not share any DNA at all with distant relatives. The more distant the relationship, the more likely it is that no DNA is shared.

On the other hand, you have to inherit your DNA from somebody, so there are some blocks of identity by descent that have been transmitted many generations.

It is possible to share a segment with very distant relatives.
"The probability that fourth cousins share at least one IBD [identical by descent] segment is 77%, and the expected length of this segment is 10 cM." Now consider the next step. There is a 50% chance that that one shared segment will not be transmitted at all, but a 90% chance that if it is transmitted it will be just as big as it was (the same 10 cM.). What this means for genealogy on 23andMe is that for two people sharing one segment identical by descent there is no way to reliably estimate how far back the common ancestor was. Furthermore, no improvement in software can possibly change that, because the limitation is imposed by the genetics itself.

No matter how far back you go, every nucleotide of one's genome is derived from some ancestor, and even going back 20 generations, the chance that the bit which has been inherited is part of a block 5 cM. or greater is still appreciable. In fact, even for 19th cousins, there is a real chance (13%) that any segment of DNA they have inherited in common will be 5 cM. or greater. This number is based on the term (1 - P(rec))n, where P(rec) is the probability that the segment will be broken up by recombination (1-size/100, where size is in cM.). For 19th cousins sharing a single ancestor, n is 40.

Of course, as mentioned above, there is very little chance that two 19th cousins will share any IBD segments at all, but this is offset if one has many 19th cousins, which is often the case.

Sometimes, more distant relationships are more likely.
23andMe reports a "predicted relationship" (e.g. "4th cousin") and a "relationship range" (e.g. "3rd to 7th cousin"). However, these ranges are likely to be wildly inaccurate, because the likely distance to a common ancestor, given only the information that two people share a single IBD segment, can vary enormously, based largely on how many relatives one has.

Here is my estimate of these values. You can skip this paragraph is you're not interested in the details.
The probability that a segment, if transmitted, will not be broken up by recombination is 1 minus the probability of recombination, which is 5% for a 5 cM. segment, 10% for a 10 cM. segment and so on. (If you are moving up a pedigree, this is the probability the segment was transmitted rather than created by recombination, but the value is the same.)
The probability that a segment is will be transmitted at all is one-half per generation.
Thus, for an nth cousin sharing a single ancestor, the probability is ((1-P(rec))/2)^(2n+2).
For an nth cousin sharing two ancestors (the usual case), the probability is
2(((1-P(rec))/2)^(2n+2)). For example, the probability of two 4th cousins sharing a specific 5 cM. segment is 2(((0.95))/2)^(10)) = 0.00117. If one has more than 855 4th cousins, then the expected number of 4th cousins sharing this segment will be greater than 1. Because every 4th cousin has the same chance of inheriting the segment, the expected number of 4th cousins who do share the segment will be directly proportional to the number of 4th cousins one has. In the case of 5th cousins, the probability of sharing a specific segment is 2(((0.95))/2)^(12)) = 0.00026, which would require 3,790 cousins for the expected number sharing the segment to exceed 1.0. In general, the number of cousins of a specific degree who should be expected to share a segment is given by

2(((1-P(rec))/2)^(2n+2)) x N

world population growth
where N is the number of relatives of that degree. For a 5 cM. segment, if the number of cousins of degree n+1 that you have is 4.43 times the number of cousins of degree n that you have, then you expect more cousins of degree n+1 than cousins of degree n to share the segment. For a 10 cM. segment, this ratio is 4.94.

Thus, if you have many more distant cousins, as would be expected if your ancestors had large families, then someone who shares a single IBD segment is more likely to be a distant cousin, because you have so many more distant cousins. The point where the increase in the number of cousins outweighs the loss of shared segments is five children per family. This is not extremely uncommon.

As an alternative to the math, consider the case of my (hypothetical) great-great-great-grandfather Joe. Let’s say that I have inherited a 5 cM. segment of DNA from him. (It’s likely that I have inherited at least one segment from him.) Our concern is whether a distant relative that shares this segment is more likely to be a fourth cousin also descended from Joe or a fifth cousin descended from Joe’s father Jacob. The chance that the 5 cM. segment was inherited by Joe, from Jacob, is slightly less than half (because of the possibility of recombination in that generation). Jacob had 12 children, so I can expect to have 12 times as many fifth cousins descended from Jacob as fourth cousins descended from Joe. That fact ends up being more significant than the chance of recombination, so I will share the segment in question with more fifth cousins than fourth cousins. This same logic applies to fifth vs. sixth cousins and so on.

Thus, my 23andMe relatives sharing one IBD segment might be fourth cousins, as predicted, or they might be distant cousins connected by prolific ancestors. There is no way to know.

The world population has increased perhaps 20-fold in the last millennium, but that works out to significantly less growth than the sustained doubling required to predict distant ancestry for people who share one IBD segment. Nevertheless, there are well-documented cases of rapid demographic expansion.
Most of your relatives may be descended from a small fraction of your ancestors.
Given that family size varies a great deal, it is no doubt common to have some ancestors who have left many more descendants than others. We all have 64 great-great-great-grandparents, typically in 32 couples. If one family among the 32 had five children and their descendants did as well, while others in the family reproduced at replacement rates (two children per family), then your more prolific ancestors (the parents of just one of your 31 great-great-grandparents) would account for over 3/4 of your fourth cousins.

In summary, it is impossible to know the relationship one has to relatives who are discovered by virtue of their sharing a single autosomal segment of DNA. The "predicted relationship" is uncertain, and even the range is hard to be sure of. The extensive information provided by 23andMe is a very useful tool for genealogy, but it cannot tell you about relatives with whom you do not share any genetic material by descent. On the other hand, relatives with whom you do share genetic material by descent can be quite distant.

Sunday, August 01, 2010

Defending science blogs

Although I am not on ScienceBlogs, I am a science blogger, so Virginia Heffernan's article on science bloggers in today's New York Times Magazine ("Unnatural Science: The uses and abuses of science blogging") got my attention. Her position that science blogs are given to "trivia, name-calling, saber rattling" and "gratuitous contempt" compelled me to reply.

The frequency with which I update my blogs is probably best described by a professional journalist as "never," but I do take blogging somewhat seriously, and I try to be professional about it. My affiliation is on the side bar, and I have read (and re-read parts of) such books as "Am I Making Myself Clear?: A Scientist's Guide to Talking to the Public," by Ms. Heffernan's more temperate colleague, Cornelia Dean.

The article starts out with an appeal to deconstructionism:

Deconstructing science is a fool’s game. In the ’90s, literary critics used to try. They’d argue that science is a system of metaphors, complete with a style and an ideology, rather than the royal road to the truth. They were laughed at as cultural relativists, posers high on Gaul­oises and nut jobs who didn’t believe in gravity.

Although amusing and partly true, this is a misrepresentation. Science does have a style and an ideology and some of us acknowledge that. In fact, my own reading of science is informed by an awareness of the differing styles and ideologies that dominate different fields and traditions within science, an awareness that has been made more acute by my own personal exposure (primarily through marriage) to literary criticism, postmodernism and social science. What scientists object to is the notion that science is nothing but a system of metaphors. Scientists uniformly believe that there truths about nature that exist quite apart from ourselves, and that science provides a tool for learning those truths. I will also admit that some of us think that, within academia, posers and nut jobs have a much easier time succeeding in fields outside of science.

Last month ... 20 or so high-placed science bloggers angrily parted ways with an extremely popular and award-winning online collective called ScienceBlogs because it starting running Food Frontiers, a nutrition blog that PepsiCo paid to have on the site.

I missed this. What can I say? I don't find enough time to blog, or even to read other blogs, although I keep thinking I should start doing it more.

ScienceBlogs has become preoccupied with trivia, name-calling and saber rattling. Maybe that’s why the ScienceBlogs ship started to sink.

...

does everyone take for granted now that science sites are where graduate students, researchers, doctors and the “skeptical community” go not to interpret data or review experiments but to chip off one-liners, promote their books and jeer at smokers, fat people and churchgoers?

Perhaps, but the ones I read this morning (those on genetics, including personal genetics) have "interesting stuff." Some of it is a bit pedantic and perhaps not that interesting to the general public, but most of the posts I looked at stuck to the science or discussed policy, and those that discussed policy were perfectly civil.

By the way, I'd recommend "Genomes Unzipped" to readers interested in a diversity of opinion about the week's events surrounding regulation of personal genetics services. Genomes Unzipped is "a group blog providing expert, independent commentary on the personal genomics industry." It is not part of ScienceBlogs, but some individual bloggers post to both.

Under cover of intellectual rigor, the science bloggers — or many of the most visible ones, anyway — prosecute agendas so charged with bigotry that it doesn’t take a pun-happy French critic or a rapier-witted Cambridge atheist to call this whole ScienceBlogs enterprise what it is, or has become: class-war claptrap.

Is she jeering?

Science blogs (including those on ScienceBlogs) are a mixed bag, just like most of the internet, and the New York Times. Readers have to exercise judgment.

Finally, there is a sidebar with recommendations, which I have to applaud.
[Update: Actually, it was a mistake to applaud this. See comments.]

SEMPER SCI
For science that’s accessible but credible, steer clear of polarizing hatefests like atheist or eco-apocalypse blogs. Instead, check out scientificamerican.com, discovermagazine.com and Anthony Watts’s blog, Watts Up With That?

SCIASPORA
David Dobbs, who quit ScienceBlogs, has written well about the consequences of “unbundling” the ScienceBlogs bloggers. See his blog at its new location at neuronculture.com.

(SCI)ENCE
Stanford’s Presidential Lectures in the Humanities are archived — and helpfully linked — at prelectur.stanford.edu. Don’t miss Jacques Derrida’s from the spring of 1999. You will think. You finally almost know. What deconstruction. Is.