Loading...
 

Phasing

Phasing is a term getting overused in the genetic genealogy community but has a very specific meaning. Phasing is the process of taking an autosomal test result of the un-ordered pairs of SNP values and assigning each member of the pair to maternal or paternal chromosome. The reason generally given is to improve false match creation that can happen when half-identical match tools aggressively match using either value. Some claim it drops many of the shorter matching segments which is evidence of its efficacy.

To phase, you need at least one of the parents tested. Optimally both. Phasing with only a single parent is not 100% successful with GEDMatch claiming it can be nearer 80% on average. With both parents, it is nearer 100% successful at phasing the child's result. After phasing, the pairs of SNPs are now ordered. Or, in some tools, they create new result kits to match against.

Critics claim that, if you have a parent tested, then phasing is not needed. Simply looking for the not-in-common matches between a parent and child will separate the child's matches into paternal and maternal. While this is true, we can often see short segment matches arise in a child that exist in neither parent. Which are generally indicative of being false matches. Using a child's phased kit to create a match list will generally remove those small, distant matches.

Phased results, and Evil Cousin kits created from them, have been shown to be helpful to support visual phasing when only two siblings test results are available. Visual Phasing is a completely different concept and process.

Here is a chart to indicate how to create a phase result for a single SNP of a child from only one parents result. You are taking the diploid value from the parent and child at a given base pair on the autosome or possibly xDNA. Then creating a diploid result as in some cases you are still not sure which value is the correct one that can be attributed to the parent. There are many error cases as well where a value cannot be determined (and so likely should be set to the original, child diploid value. You can likely see how having both parents values can sometimes help refine the answer. And why phased kits are still created with diploid values for feeding into the segment matching tools that already know how to deal with that.
Phasing Table
← Parent →
↑ Child ↓AAATCCCGGGTT
AAAAAAerrerrerrerr
ATAAATerrerrerrTT
CCerrerrCCCCerrerr
CGerrerrCCCGGGerr
GGerrerrerrGGGGerr
TTerrTTerrerrerrTT

The phased child result (with the selected parent) is in the table body. Any entry marked "err" should be replaced with the original childs' value (and maybe so annotated). Other combinations of diploid alleles may be indicative of an incorrect allele call or an undetected InDel. The childs original value should again likely prevail and be so marked. The values are unordered and shown as sorted lowest alphabetic first. So AT and TA are the same. As are CG and GC.