Saturday, February 19, 2011

Genetic Genealogy and the Single Segment

Last year, my wife Janet and I sent our DNA off to 23andMe for analysis. Among the tools that they provide is a "Relative Finder," which lists other people on the site who share regions of DNA that appear to be identical by descent. In my case, there are 476 people listed, each sharing between 0.07% and 0.46% of my genome, almost always as a single segment (there are 18 people with whom I share two segments). These people are generally anonymous, but you have an opportunity to make contact and invite them to "share genomes," which means only that you can see which regions are shared.

There are a lot of people on 23andMe who are quite interested in this tool, and who use it for genetic genealogy. Many of these same people also use Family Tree DNA and ancestry.com. As a result of my interactions with these 23andMe relatives, and following the discussions on the 23andMe community forums, I have been thinking about, and researching, what it means to share one segment of DNA by descent with someone. In the process, I have realized some things that are not fully appreciated by most of the genealogy buffs on 23andMe.

I am presenting these insights here, and will consider them one at a time.
  • Distant relatives often share no genetic material at all.
  • It is possible to share a segment with very distant relatives.
  • Sometimes, more distant relationships are more likely.
  • Most of your relatives may be descended from a small fraction of your ancestors.
Distant relatives (fourth cousins and beyond) often share no genetic material.
The chances of not sharing any DNA at all becomes appreciable with fourth cousins and rises to approximately half with fifth cousins. This is based on my own simplified calculations and those of Donnelly (1983), who opines that "proof of descent from William Shakespeare does little to increase the probability that the claimant has genes in common with him." There are limits to what can be accomplished by genetic genealogy that are imposed by the real chance that you simply do not share any DNA at all with distant relatives. The more distant the relationship, the more likely it is that no DNA is shared.

On the other hand, you have to inherit your DNA from somebody, so there are some blocks of identity by descent that have been transmitted many generations.

It is possible to share a segment with very distant relatives.
"The probability that fourth cousins share at least one IBD [identical by descent] segment is 77%, and the expected length of this segment is 10 cM." Now consider the next step. There is a 50% chance that that one shared segment will not be transmitted at all, but a 90% chance that if it is transmitted it will be just as big as it was (the same 10 cM.). What this means for genealogy on 23andMe is that for two people sharing one segment identical by descent there is no way to reliably estimate how far back the common ancestor was. Furthermore, no improvement in software can possibly change that, because the limitation is imposed by the genetics itself.

No matter how far back you go, every nucleotide of one's genome is derived from some ancestor, and even going back 20 generations, the chance that the bit which has been inherited is part of a block 5 cM. or greater is still appreciable. In fact, even for 19th cousins, there is a real chance (13%) that any segment of DNA they have inherited in common will be 5 cM. or greater. This number is based on the term (1 - P(rec))n, where P(rec) is the probability that the segment will be broken up by recombination (1-size/100, where size is in cM.). For 19th cousins sharing a single ancestor, n is 40.

Of course, as mentioned above, there is very little chance that two 19th cousins will share any IBD segments at all, but this is offset if one has many 19th cousins, which is often the case.

Sometimes, more distant relationships are more likely.
23andMe reports a "predicted relationship" (e.g. "4th cousin") and a "relationship range" (e.g. "3rd to 7th cousin"). However, these ranges are likely to be wildly inaccurate, because the likely distance to a common ancestor, given only the information that two people share a single IBD segment, can vary enormously, based largely on how many relatives one has.

Here is my estimate of these values. You can skip this paragraph is you're not interested in the details.
The probability that a segment, if transmitted, will not be broken up by recombination is 1 minus the probability of recombination, which is 5% for a 5 cM. segment, 10% for a 10 cM. segment and so on. (If you are moving up a pedigree, this is the probability the segment was transmitted rather than created by recombination, but the value is the same.)
The probability that a segment is will be transmitted at all is one-half per generation.
Thus, for an nth cousin sharing a single ancestor, the probability is ((1-P(rec))/2)^(2n+2).
For an nth cousin sharing two ancestors (the usual case), the probability is
2(((1-P(rec))/2)^(2n+2)). For example, the probability of two 4th cousins sharing a specific 5 cM. segment is 2(((0.95))/2)^(10)) = 0.00117. If one has more than 855 4th cousins, then the expected number of 4th cousins sharing this segment will be greater than 1. Because every 4th cousin has the same chance of inheriting the segment, the expected number of 4th cousins who do share the segment will be directly proportional to the number of 4th cousins one has. In the case of 5th cousins, the probability of sharing a specific segment is 2(((0.95))/2)^(12)) = 0.00026, which would require 3,790 cousins for the expected number sharing the segment to exceed 1.0. In general, the number of cousins of a specific degree who should be expected to share a segment is given by

2(((1-P(rec))/2)^(2n+2)) x N

world population growth
where N is the number of relatives of that degree. For a 5 cM. segment, if the number of cousins of degree n+1 that you have is 4.43 times the number of cousins of degree n that you have, then you expect more cousins of degree n+1 than cousins of degree n to share the segment. For a 10 cM. segment, this ratio is 4.94.

Thus, if you have many more distant cousins, as would be expected if your ancestors had large families, then someone who shares a single IBD segment is more likely to be a distant cousin, because you have so many more distant cousins. The point where the increase in the number of cousins outweighs the loss of shared segments is five children per family. This is not extremely uncommon.

As an alternative to the math, consider the case of my (hypothetical) great-great-great-grandfather Joe. Let’s say that I have inherited a 5 cM. segment of DNA from him. (It’s likely that I have inherited at least one segment from him.) Our concern is whether a distant relative that shares this segment is more likely to be a fourth cousin also descended from Joe or a fifth cousin descended from Joe’s father Jacob. The chance that the 5 cM. segment was inherited by Joe, from Jacob, is slightly less than half (because of the possibility of recombination in that generation). Jacob had 12 children, so I can expect to have 12 times as many fifth cousins descended from Jacob as fourth cousins descended from Joe. That fact ends up being more significant than the chance of recombination, so I will share the segment in question with more fifth cousins than fourth cousins. This same logic applies to fifth vs. sixth cousins and so on.

Thus, my 23andMe relatives sharing one IBD segment might be fourth cousins, as predicted, or they might be distant cousins connected by prolific ancestors. There is no way to know.

The world population has increased perhaps 20-fold in the last millennium, but that works out to significantly less growth than the sustained doubling required to predict distant ancestry for people who share one IBD segment. Nevertheless, there are well-documented cases of rapid demographic expansion.
Most of your relatives may be descended from a small fraction of your ancestors.
Given that family size varies a great deal, it is no doubt common to have some ancestors who have left many more descendants than others. We all have 64 great-great-great-grandparents, typically in 32 couples. If one family among the 32 had five children and their descendants did as well, while others in the family reproduced at replacement rates (two children per family), then your more prolific ancestors (the parents of just one of your 31 great-great-grandparents) would account for over 3/4 of your fourth cousins.

In summary, it is impossible to know the relationship one has to relatives who are discovered by virtue of their sharing a single autosomal segment of DNA. The "predicted relationship" is uncertain, and even the range is hard to be sure of. The extensive information provided by 23andMe is a very useful tool for genealogy, but it cannot tell you about relatives with whom you do not share any genetic material by descent. On the other hand, relatives with whom you do share genetic material by descent can be quite distant.