Saturday, February 19, 2011

Genetic Genealogy and the Single Segment

Last year, my wife Janet and I sent our DNA off to 23andMe for analysis. Among the tools that they provide is a "Relative Finder," which lists other people on the site who share regions of DNA that appear to be identical by descent. In my case, there are 476 people listed, each sharing between 0.07% and 0.46% of my genome, almost always as a single segment (there are 18 people with whom I share two segments). These people are generally anonymous, but you have an opportunity to make contact and invite them to "share genomes," which means only that you can see which regions are shared.

There are a lot of people on 23andMe who are quite interested in this tool, and who use it for genetic genealogy. Many of these same people also use Family Tree DNA and ancestry.com. As a result of my interactions with these 23andMe relatives, and following the discussions on the 23andMe community forums, I have been thinking about, and researching, what it means to share one segment of DNA by descent with someone. In the process, I have realized some things that are not fully appreciated by most of the genealogy buffs on 23andMe.

I am presenting these insights here, and will consider them one at a time.
  • Distant relatives often share no genetic material at all.
  • It is possible to share a segment with very distant relatives.
  • Sometimes, more distant relationships are more likely.
  • Most of your relatives may be descended from a small fraction of your ancestors.
Distant relatives (fourth cousins and beyond) often share no genetic material.
The chances of not sharing any DNA at all becomes appreciable with fourth cousins and rises to approximately half with fifth cousins. This is based on my own simplified calculations and those of Donnelly (1983), who opines that "proof of descent from William Shakespeare does little to increase the probability that the claimant has genes in common with him." There are limits to what can be accomplished by genetic genealogy that are imposed by the real chance that you simply do not share any DNA at all with distant relatives. The more distant the relationship, the more likely it is that no DNA is shared.

On the other hand, you have to inherit your DNA from somebody, so there are some blocks of identity by descent that have been transmitted many generations.

It is possible to share a segment with very distant relatives.
"The probability that fourth cousins share at least one IBD [identical by descent] segment is 77%, and the expected length of this segment is 10 cM." Now consider the next step. There is a 50% chance that that one shared segment will not be transmitted at all, but a 90% chance that if it is transmitted it will be just as big as it was (the same 10 cM.). What this means for genealogy on 23andMe is that for two people sharing one segment identical by descent there is no way to reliably estimate how far back the common ancestor was. Furthermore, no improvement in software can possibly change that, because the limitation is imposed by the genetics itself.

No matter how far back you go, every nucleotide of one's genome is derived from some ancestor, and even going back 20 generations, the chance that the bit which has been inherited is part of a block 5 cM. or greater is still appreciable. In fact, even for 19th cousins, there is a real chance (13%) that any segment of DNA they have inherited in common will be 5 cM. or greater. This number is based on the term (1 - P(rec))n, where P(rec) is the probability that the segment will be broken up by recombination (1-size/100, where size is in cM.). For 19th cousins sharing a single ancestor, n is 40.

Of course, as mentioned above, there is very little chance that two 19th cousins will share any IBD segments at all, but this is offset if one has many 19th cousins, which is often the case.

Sometimes, more distant relationships are more likely.
23andMe reports a "predicted relationship" (e.g. "4th cousin") and a "relationship range" (e.g. "3rd to 7th cousin"). However, these ranges are likely to be wildly inaccurate, because the likely distance to a common ancestor, given only the information that two people share a single IBD segment, can vary enormously, based largely on how many relatives one has.

Here is my estimate of these values. You can skip this paragraph is you're not interested in the details.
The probability that a segment, if transmitted, will not be broken up by recombination is 1 minus the probability of recombination, which is 5% for a 5 cM. segment, 10% for a 10 cM. segment and so on. (If you are moving up a pedigree, this is the probability the segment was transmitted rather than created by recombination, but the value is the same.)
The probability that a segment is transmitted at all is one-half per generation.
Thus, for an nth cousin sharing a single ancestor, the probability is ((1-P(rec))/2)^(2n+2).
For an nth cousin sharing two ancestors (the usual case), the probability is
2(((1-P(rec))/2)^(2n+2)). For example, the probability of two 4th cousins sharing a specific 5 cM. segment is 2(((0.95))/2)^(10)) = 0.00117. If one has more than 855 4th cousins, then the expected number of 4th cousins sharing this segment will be greater than 1. Because every 4th cousin has the same chance of inheriting the segment, the expected number of 4th cousins who do share the segment will be directly proportional to the number of 4th cousins one has. In the case of 5th cousins, the probability of sharing a specific segment is 2(((0.95))/2)^(12)) = 0.00026, which would require 3,790 cousins for the expected number sharing the segment to exceed 1.0. In general, the number of cousins of a specific degree who should be expected to share a segment is given by

2(((1-P(rec))/2)^(2n+2)) x N

world population growth
where N is the number of relatives of that degree. For a 5 cM. segment, if the number of cousins of degree n+1 that you have is 4.43 times the number of cousins of degree n that you have, then you expect more cousins of degree n+1 than cousins of degree n to share the segment. For a 10 cM. segment, this ratio is 4.94.

Thus, if you have many more distant cousins, as would be expected if your ancestors had large families, then someone who shares a single IBD segment is more likely to be a distant cousin, because you have so many more distant cousins. The point where the increase in the number of cousins outweighs the loss of shared segments is five children per family. This is not extremely uncommon.

As an alternative to the math, consider the case of my (hypothetical) great-great-great-grandfather Joe. Let’s say that I have inherited a 5 cM. segment of DNA from him. (It’s likely that I have inherited at least one segment from him.) Our concern is whether a distant relative that shares this segment is more likely to be a fourth cousin also descended from Joe or a fifth cousin descended from Joe’s father Jacob. The chance that the 5 cM. segment was inherited by Joe, from Jacob, is slightly less than half (because of the possibility of recombination in that generation). Jacob had 12 children, so I can expect to have 12 times as many fifth cousins descended from Jacob as fourth cousins descended from Joe. That fact ends up being more significant than the chance of recombination, so I will share the segment in question with more fifth cousins than fourth cousins. This same logic applies to fifth vs. sixth cousins and so on.

Thus, my 23andMe relatives sharing one IBD segment might be fourth cousins, as predicted, or they might be distant cousins connected by prolific ancestors. There is no way to know.

The world population has increased perhaps 20-fold in the last millennium, but that works out to significantly less growth than the sustained doubling required to predict distant ancestry for people who share one IBD segment. Nevertheless, there are well-documented cases of rapid demographic expansion.
Most of your relatives may be descended from a small fraction of your ancestors.
Given that family size varies a great deal, it is no doubt common to have some ancestors who have left many more descendants than others. We all have 64 great-great-great-great-grandparents, typically in 32 couples. If one family among the 32 had five children and their descendants did as well, while others in the family reproduced at replacement rates (two children per family), then your more prolific ancestors (the parents of just one of your 32 great-great-great-grandparents) would account for over 3/4 of your fourth cousins.

In summary, it is impossible to know the relationship one has to relatives who are discovered by virtue of their sharing a single autosomal segment of DNA. The "predicted relationship" is uncertain, and even the range is hard to be sure of. The extensive information provided by 23andMe is a very useful tool for genealogy, but it cannot tell you about relatives with whom you do not share any genetic material by descent. On the other hand, relatives with whom you do share genetic material by descent can be quite distant.

Sunday, August 01, 2010

Defending science blogs

Although I am not on ScienceBlogs, I am a science blogger, so Virginia Heffernan's article on science bloggers in today's New York Times Magazine ("Unnatural Science: The uses and abuses of science blogging") got my attention. Her position that science blogs are given to "trivia, name-calling, saber rattling" and "gratuitous contempt" compelled me to reply.

The frequency with which I update my blogs is probably best described by a professional journalist as "never," but I do take blogging somewhat seriously, and I try to be professional about it. My affiliation is on the side bar, and I have read (and re-read parts of) such books as "Am I Making Myself Clear?: A Scientist's Guide to Talking to the Public," by Ms. Heffernan's more temperate colleague, Cornelia Dean.

The article starts out with an appeal to deconstructionism:

Deconstructing science is a fool’s game. In the ’90s, literary critics used to try. They’d argue that science is a system of metaphors, complete with a style and an ideology, rather than the royal road to the truth. They were laughed at as cultural relativists, posers high on Gaul­oises and nut jobs who didn’t believe in gravity.

Although amusing and partly true, this is a misrepresentation. Science does have a style and an ideology and some of us acknowledge that. In fact, my own reading of science is informed by an awareness of the differing styles and ideologies that dominate different fields and traditions within science, an awareness that has been made more acute by my own personal exposure (primarily through marriage) to literary criticism, postmodernism and social science. What scientists object to is the notion that science is nothing but a system of metaphors. Scientists uniformly believe that there truths about nature that exist quite apart from ourselves, and that science provides a tool for learning those truths. I will also admit that some of us think that, within academia, posers and nut jobs have a much easier time succeeding in fields outside of science.

Last month ... 20 or so high-placed science bloggers angrily parted ways with an extremely popular and award-winning online collective called ScienceBlogs because it starting running Food Frontiers, a nutrition blog that PepsiCo paid to have on the site.

I missed this. What can I say? I don't find enough time to blog, or even to read other blogs, although I keep thinking I should start doing it more.

ScienceBlogs has become preoccupied with trivia, name-calling and saber rattling. Maybe that’s why the ScienceBlogs ship started to sink.

...

does everyone take for granted now that science sites are where graduate students, researchers, doctors and the “skeptical community” go not to interpret data or review experiments but to chip off one-liners, promote their books and jeer at smokers, fat people and churchgoers?

Perhaps, but the ones I read this morning (those on genetics, including personal genetics) have "interesting stuff." Some of it is a bit pedantic and perhaps not that interesting to the general public, but most of the posts I looked at stuck to the science or discussed policy, and those that discussed policy were perfectly civil.

By the way, I'd recommend "Genomes Unzipped" to readers interested in a diversity of opinion about the week's events surrounding regulation of personal genetics services. Genomes Unzipped is "a group blog providing expert, independent commentary on the personal genomics industry." It is not part of ScienceBlogs, but some individual bloggers post to both.

Under cover of intellectual rigor, the science bloggers — or many of the most visible ones, anyway — prosecute agendas so charged with bigotry that it doesn’t take a pun-happy French critic or a rapier-witted Cambridge atheist to call this whole ScienceBlogs enterprise what it is, or has become: class-war claptrap.

Is she jeering?

Science blogs (including those on ScienceBlogs) are a mixed bag, just like most of the internet, and the New York Times. Readers have to exercise judgment.

Finally, there is a sidebar with recommendations, which I have to applaud.
[Update: Actually, it was a mistake to applaud this. See comments.]

SEMPER SCI
For science that’s accessible but credible, steer clear of polarizing hatefests like atheist or eco-apocalypse blogs. Instead, check out scientificamerican.com, discovermagazine.com and Anthony Watts’s blog, Watts Up With That?

SCIASPORA
David Dobbs, who quit ScienceBlogs, has written well about the consequences of “unbundling” the ScienceBlogs bloggers. See his blog at its new location at neuronculture.com.

(SCI)ENCE
Stanford’s Presidential Lectures in the Humanities are archived — and helpfully linked — at prelectur.stanford.edu. Don’t miss Jacques Derrida’s from the spring of 1999. You will think. You finally almost know. What deconstruction. Is.

Saturday, May 29, 2010

Can we not speak of fish?

I would like to defend the use of paraphyletic groups in scientific discourse and literature. Paraphyletic groups can be well-defined in terms of monophyletic units (as relative complements), and defining paraphyletic groups in terms of monophyletic groups is preferable to treating them as invalid.

Let me start with a story. Wednesday evening (May 26th) I checked my Twitter feed, and saw a number of tweets from Jonathan Eisen (phylogenomics), who was at the ASM meeting.
phylogenomics on TwitterJonathan is in the department of Ecology and Evolution at UC Davis, the author of a popular textbook on Evolution and a frequent blogger ("Tree of Life"). For those of you not used to reading Twitter feeds, note that the most recent tweets are at the top.

Norm Pace bangs on prokaryote 1Norm Pace bangs on prokaryote 1I know both Norm Pace and Jonathan Eisen. Thanks to Norm's personal style and Jonathan's excellent selection of quotes, reading this was like being in the room with Norm. I love hearing him talk. However, I do not entirely agree with him. I have spent my life studying gene expression in eukaryotes, and my perspective is that the differences between eukaryotes and other species ("prokaryotes") are fundamental. In prokaryotes, coupled transcription and translation (which is impossible when there is a nucleus) allows the widespread use of polycistronic mRNAs, which allow operons, which in turn contribute to many important features, including the ease with which biologically useful bits of genetic information can be horizontally transferred. The argument, repeated here by Norm Pace, that "no one can say what a prokaryote is, only what it is not" was addressed by Martin and Koonin, who proposed a "positive definition of prokaryotes" based on coupled transcription and translation. This, however, is not the point. The point is that the nucleus is a derived feature and prokaryotes are a paraphyletic group, meaning that the last common ancestor of all prokaryotes has descendants that are not prokaryotes. Nevertheless, the group is well-defined (as all life other than eukaryotes) and useful, so I commented:

Prokaryotes are a well-defined group.A bit later, I commented again.
Jonathan's not buying it.Prokaryotes are a paraphyletic group. That means that the last common ancestor of all prokaryotes has eukaryotic descendants. Most taxonomists today prefer not to talk about paraphyletic groups at all, but to speak only of monophyletic groups, or clades (which consist entirely of species with a common ancestor). However, there are many paraphyletic groups that "make sense" and are commonly used. Examples include prokaryotes, fish, reptiles and dicots.

My point is that defining a paraphyletic group as the relative complement of one clade with respect to another makes it well-defined, and such a definition more closely suits what people have in mind.

Defining a paraphytic group P as the complement of one monophyletic group, C, with respect to another, G
In the hypothetical example shown here, most taxonomists would want to list "natural taxa" (by which they would mean monophyletic groups, or clades), and would say something like "Q, R and S are slithy." To say "G other than C are slithy" is more compact because it makes reference to fewer taxa. To say "P are slithy" is exactly the same, and is the most compact way of making the statement, but requires reference to a paraphyletic group.

To pursue this further, I asked my colleagues what they thought:
My dear friends in systematics,

I have a question about systematics that I would like your opinion on. It seems a sufficiently central question that I suspect you have already formed an opinion. The issue is a practical one, regarding how biologists should use terms. It is also philosophical (but in the rigorous sense, relating to the idea that without a proper philosophical basis one cannot do science at all).

Consider a monophyletic group of organisms, G, and another phylogenetic group within it, C (for clade). Let us suppose that C is characterized by some fundamental innovation, such that organisms within this clade have a long list of features not found in the other species within G. Furthermore, species within G but not C share a long list of features that have been lost by all species in C. As a result, there is a need to talk about another grouping, W (for wrong), of those species within G but not C. There is no doubt about the phylogeny. C and G are monophyletic but W is not. Molecules and morphology agree. However, all species within W share many features lacking in all species within C, and this is true both morphologically and molecularly.

Is it ever right for a scientist to talk about W as a group?
You know the list (reptiles, fish, dicots, prokaryotes).

Back story.
This came up last night as an argument between Jonathan Eisen and myself, on Twitter. You can see most of it by looking at feeds for
phylogenomics, ongenetics and smount, but given the volatile and perspective-based nature of Twitter feeds I've pasted the relevant tweets into the attached word document (it reads from most recent to earliest so my might want to start at the bottom and work up). Jonathan is at the ASM meeting. He is a Twitter addict who has generated over 4500 tweets in the last year or so (a day with only 10 would be unusual for him). I find it useful and interesting to follow him. I am both ongenetics and smount (I didn't mean to switch but I changed computers and forgot to switch).

I find it useful to refer to prokaryotes (and to fish). Jonathan says "grouping together bacteria/archaea is inappropriate; I note in my evolution textbook we use "bacteria & archaea" a lot". Wouldn't it be simpler if he just used "prokaryote."? I'm looking for advice here.

Thanks,

Steve
I received a thoughtful reply from Chuck Delwiche:
Well, I'm basically with Jonathan on this, although I think I'm slightly more moderate. "Fish," "prokaryotes," "reptiles," "dicots," etc. are really form-classes -- they describe the appearance of the organism, but not its evolutionary relationships. Naming paraphyletic groups is somewhat less objectionable than naming grossly polyphyletic ones, so I don't object to naming the North American Drosophila in a way that ignores the Hawaiian species that are derived from within it (this an example of your C/G case). But it really is confusing to refer to prokaryotes. Although they have coupled transcription and translation, the are other aspects of DNA replication, transcription, and translation that show striking similarities between Archaea and Eukarya. If you talk about "prokaryotes" as if the term represented a lineage rather than a morphology then it tends to obscure both diversity within them similarities between Archaea and Eukarya.

The reason this is important is that hides the predictive value that a natural classification can provide. Within your group G there would be some taxa that are more closely related to C than others, and they will share properties with C despite the long branch and loss of characters you describe. If you treat "fish" as a group it is confusing that Teleosts have immune systems that more nearly resemble those of tetrapods than do those of lampreys or hagfish. I don't know anything about lung- or lobe-finned fish immunology, but I'll bet they are even more tetrapod-like than those of Teleosts. Much the same statements could be made for skeletal structure, tooth anatomy, ventilation mechanisms, and I don't know what all else.

This is why we Must Never Speak of Fish Again.

Chuck
This is pretty much what I expected him to say, but there are two things I'd like to note. The first is this "Within your group G there would be some taxa that are more closely related to C than others, and they will share properties with C despite the long branch and loss of characters you describe. If you treat "fish" as a group it is confusing that Teleosts have immune systems that more nearly resemble those of tetrapods than do those of lampreys or hagfish." This is a very good point that anyone who refers to paraphyletic groups must bear in mind.

The second thing that struck me is that he wrote "'Fish,' 'prokaryotes,' 'reptiles,' 'dicots,' etc. are really form-classes -- they describe the appearance of the organism, but not its evolutionary relationships" despite the fact that I had provided a rigorous definition in terms of evolutionary relationships. Defining "fish" as vertebrates other than tetrapods makes it something other than a form class, and also eliminates any confusion about whales. Defining prokaryotes as organisms other than eukaryotes makes saying "prokaryotes" synonymous with saying "bacteria and archaea." It strikes me that this is a natural group in the sense that if a new domain of life were to be discovered (perhaps on Mars, or deep within the earth) with coupled transcription and translation and no nucleus or mitochondrion, people would want to group it with bacterial and archaea, even if phylogenetic analysis showed that it shared a most recent common ancestor with the eukaryotic nucleus.

In summary, I fully support the definition of taxa as monophyletic groups, and I would like to see them used to more rigorously define paraphyletic groups. Scientists will continue to refer to paraphyletic groups, and for good reasons. When they do, it would be useful if those groups were understood to be the relative complements of monophyletic taxa rather than informal categories, form classes or sloppy and unscientific categories.

I will continue to speak of fish. When I do, I will be referring to vertebrates that are not tetrapods. While I respect and understand colleagues who will never speak of fish, they must understand that this group is well-defined in terms of groups they recognize. I hope that all scientists move towards a more precise taxonomic basis for the groups that they will continue to talk about.

----------------------------------------
I thank Jonathan Eisen, Chuck Delwiche and Charlie Mitter for their contributions to this post. It goes without saying that the opinions expressed are, however, mine. The complete email thread (with Delwiche and Mitter) is available here.

----------------------------------------
Postscript (June 13).
Charlie Mitter forwarded Farris 1979 (Systematic Zoology 28:483-519), which describes the state of systematics at that time as a debate between pheneticists, phylogeneticists and evolutionists about the principles that should underlie a general reference system for biology. I believe that this debate has been fully resolved in favor of the phylogeneticists, and I am fully persuaded that the business of systematics is the definition of monophyletic groups. My points here are that 1) biologists sometimes have good reasons to refer to paraphyletic groups and 2) when they do, it is better, where possible, to understand those groups in terms of monophyletic groups. It is precisely because I agree with the arguments of Farris in favor of phylogenetics that I think that the paraphyletic groups to which scientists will inevitably refer should be defined in terms of phylogenetic taxa (clades) and not thought of as elemental taxonomic units.

Monday, January 12, 2009

Does the rs7901695 C variant predispose to diabetes by creating a cryptic exon?

The discovery of disease-SNP associations through genome-wide association studies continues at a remarkable pace, but a recent review of common variants implicated in type 2 diabetes (T2D) suggests that, at least for this disease, current methods are unlikely to find many additional susceptibility loci (Prokopenko et al. 2008). We are now at the stage where "additional investigation is needed to define the causal variants, ... to understand disease mechanisms and to effect clinical translation." I continue to be interested in the (often underappreciated) contribution of pre-mRNA splicing to variation in gene activity, and I was especially intrigued by the statement that the variant with greatest effect size (the rs7901695 C variant in TCF7L2) lies in an intron but its mechanism of action is not understood. In order to investigate this I submitted the sequence surrounding this variant to SplicePort, our splice site predictor and analysis tool (see Dogan et al. 2007). Sure enough, the rs7901695 C variant alters the predicted strength of nearby splice sites. Because this sequence is deep in an intron, these would be potential cryptic splice sites, sites at which splicing occurs only in the case of a mutation.

SplicePort analysis of rs7901695
Most striking is the activation of a splice acceptor site 68 nucleotides upstream of the variant SNP (position 688 in the submitted sequence or 114,744,012 on chromosome 10). The SplicePort score, which is -0.41 for the T allele, but -0.02 for the C allele, can be understood by noting that while 95.66% of splice acceptors have a score greater than -0.41, 89.01% of splice acceptors score above -0.02. Thus, the C allele acceptor site, although still relatively weak, is clearly better and well within the range of variation for real splice sites (99% of acceptors score above -0.86 and the median score is 0.923).

How might a C to T change affect an acceptor splice site upstream? Spliceport provides a feature browser that lists the features used for scoring any site. In this case, the following "downstream features" contribute to the score of the C variant but not the T variant: cgg (0.112), ctac (0.083), cg (0.072), ctacg (0.06), tacg (0.059), acg (0.043) and acggg (0.035). An independent approach, ESEfinder, similarly identifies this sequence context (CTACGGG but not CTATGGG) as an exonic splicing enhancer potentially recognized by ASF/SF2, SRp40 or SRp55. Thus, the rs7901695 C variant might activate the upstream acceptor site by functioning as part of an exonic splicing enhancer that is activated by one or more SR proteins.

The next question is how the activation of an acceptor splice site deep within an intron would affect gene expression. A splice site that is activated by mutation is known as a cryptic splice site, and an exon that is used only in mutant alleles is referred to as a cryptic exon. Intron mutations that affect gene expression by creating cryptic exons have been known for some time. In fact, I wrote a commentary on several such mutations in the human beta-globin gene over 25 years ago while still a graduate student ("Lessons from mutant globins," Mount and Steitz 1983). Mutations that activate cryptic exons are often overlooked because they lie away from splice sites, and because the resulting RNA is often unstable due to nonsense mediated decay. Nevertheless, there are now hundreds of papers describing such mutations. The case here is especially tricky because the SNP does not directly create a cyptic splice site, but may activate one at a distance.

cryptic exon
Thus, activation of a cryptic exon is a reasonable hypothesis for the effect of the rs7901695 variant on TCF7L2. In this model transcripts from the C allele are more likely than transcripts from the T allele to be aberrantly spliced and ultimately degraded. The lack of EST data supporting a cryptic exon in this region can be explained by nonsense-mediated decay. This proposed mechanism is similar to regulated unproductive splicing and translation ("Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans." Lewis et al. 2003), an important difference being that cryptic exons are generated by mutation rather than being regulated alternative exons.

How likely is this hypothesis? I could not find any papers that have investigated the effect of this variant on splicing. Clearly, the next step is to look for evidence of the cyptic exon and verify that the C variant does indeed introduce an exonic splicing enhancer. There is also the possibility that other SNPs associated with the risk variant haplotype ("HapBT2D"), particularly rs7903146, are more likely to be causative (Helgason et al. 2007). I could not find a direct comparison of the relative risk for these two variants, but it's possible that association data alone will rule out rs7901695, or even that they already have. Colleagues have suggested that I pursue this in my own lab, but I work other things, and there are people in the diabetes field that can do this quickly. I only ask that they cite this post (here's how).

Although additional investigation is needed, the rs7901695 variant is certainly capable of explaining an effect on the expression of TCF7L2 through activation of a cryptic exon. This case is an example of how SplicePort can be used to evaluate the potential of variants to alter splicing. We plan to systematically evaluate the possible effect of all human SNPs on splicing. In the meantime, I strongly encourage investigators to use SplicePort to evaluate variants of interest on their own.

Saturday, November 08, 2008

Remembering C.C. Tan

I read this morning (link) that Tan Jiazhen, better known in the U.S. as C. C. Tan, passed away Nov. 1, at age 99. I suspect that his influence on genetics probably much greater than most Americans appreciate. He worked with the first generation of Drosophila geneticists, and he was Dobzhansky's first Ph.D. student at Cal Tech, yet his career extends into the modern era, and many of the young Chinese scientists coming to the United States now have met him. It's impossible for me to evaluate how much he is responsible for the intellectual "silk road" that contributes so much vitality to twenty first century genetics, but I suspect that without C.C. Tan it would be much less traveled. Interested readers should consult Jim Crow's commentary in genetics (Vol. 164, pg. 1 *) to see how he managed to bring Chinese genetics into the modern era, past the Lysenko years and the Cultural Revolution.

* This page, like most at genetics.org, does not load properly in Firefox on Windows. I'm sure that the GSA will fix that. For now, I just use another browser when I visit the GSA.

Sunday, July 13, 2008

Do I have the right to know my own genetic makeup?

I took a little time yesterday (July 12) to attend a panel discussion on direct to consumer genetic testing at the Genetic Alliance annual conference. Panelists were Sue Friedman, from FORCE; Trish Brown, from DNA Direct; Joanna Mountain, from 23andMe and Sean George, from Navigenics. Francis Collins, Director of the National Human Genome Research Institute, moderated. Once each of the panelists had made their opening remarks, Collins started the discussion by asking why, if personalized genetics is so wonderful, the states of California and New York have issued cease and desist orders to several personalized genetics companies (story). The response was conciliatory, echoing statements on the 23andMe blog (the spittoon):
We agree that this evolving field of personal genomics is in need of proper regulatory oversight. While our mission to provide accurate and contextual information to our customers about their genetic information is aligned with the regulatory mandate to protect the public health, we also want to ensure that efforts to rein in our industry do not hamper the potential benefit of genetic knowledge to our health.
Many relevant issues were brought up, by panel members, or by those in the audience. Can people deal appropriately with uncertainty? Do they understand the relationship between genotype, the environment and phenotype? What about genetic information with predictive value, but about which the consumer can do nothing? The case of APOE was discussed at length.

I do feel that we have a right to obtain information about our own genetic makeup without having to justify ourselves, to a physician, to an insurance company, or to the state of California. I am also skeptical of the perception that "most people are incapable of grasping the relevance of provisional, statistical information." In any case, an enterprise that feeds users with 500,000 bits of information, most of which have no significance, seems more likely to help people understand that genotype is not fate than to have the opposite effect.

Giving people genetic information can be separated from giving them advice, and it seems to me that providing information about genotype should be regulated only to the extent that technical standards are met. This is analogous to surveyors giving me information about the elevation of my house. That information, by itself, is not advice about flood risk, and I would be surprised if surveyors were required to provide accurate assessments of that risk in order to operate, or forbidden from providing consumers with data that a third party judged to be of little value.

The panel helped me to understand the risk of consumer fraud, but, ultimately, I feel, strongly, that I have a right to know my own genetic makeup. Furthermore, I find it insulting to say that consumers are incapable of understanding uncertainty. There is certainly room for regulation, but I hope that my right to pay someone to tell me about my own genes is not infringed. Perhaps it is most important to prevent companies from taking money for tests without providing portable genotype data whose implications can be evaluated by a third party in the light of new information, which could be information about the implications of that specific information, other genetic information that might influence how it is interpreted, or information about the interactions between that bit of genotype and other factors such as one's diet or medical history.

Links for this article:
  • NHGRI (National Human Genome Research Institute), with news, links to research, funding opportunities, fact sheets, and career opportunities, including such tidbits as a Catalog of genome-wide association studies.


  • "Should Personal Genomics Be Regulated" Tim O'Reilley's blog on the subject (with interesting comments and discussion).


  • DNA Direct. DNA Direct's services focus on personalized test result interpretation and supportive materials and services.


  • Genetic Alliance. The Genetic Alliance works to eliminate obstacles and limitations within the genetics community through novel partnerships among stakeholders and integration of individual, family, and community perspectives to improve health systems and inform decisions.


  • "Getting up close and personal with your genome," a news summary for the scientist written by Laura Bonetta and published in Cell (2008 May 30;133(5):753-6)